Utopia Talk - Politics - Seb on statistics

Welcome to the Utopia Forums! Register a new account
The current time is Wed Apr 24 18:40:28 2024

Utopia Talk / Politics / Seb on statistics

Nimatzo
iChihuaha Mon Aug 27 04:55:20
Correlation factor of 0.022 is robust evidence of causal mechanism.
- seb

http://www...hread=83151&time=1534599101645

http://www.utopiaforums.com/boardthread?id=politics&thread=83236&time=1535361445726

The gist of it being that seb posted a study on gender bias in acceptance rates of github pull requests. Huge samples size with statistically significant results and a correlation of 0.022. That women on average do better than men in having their requests accepted. YES, the reverse.

He is adamant that this is evidence of patriarchy, because in 1 sub class where requests are from outsiders, women's acceptance falls more than men's, they both fall.

Also the largest drop for both men and women, is when they do not have google+ accounts.

But still keep in mind r = 0.022 which is a quarter of the effect of other studies on bias.

And everyone knows I am not making this up, even if you didn't follow the threads. This is seb.

Here is the study:
http://peerj.com/articles/cs-111/

Any second opinions?

Seb
Member Mon Aug 27 05:23:20
Nim:

*Sigh*

As you know, the r published was for the aggregate across identified, unidentified, insider and outsider.

The paper itself pointed out that the effects were different for both those two additional variates.

So you *know* perfectly well the correlation for the combination will be poor as it mixes to pertinent variables.

Further, you admit here in this very post I based my argument on a subset of the data for which an r was not published.

So you *know* it's dishonest to use the r for the aggregate data to characterise a subset looking at the parameter of interest.

What are you trying to achieve here other than demonstrate extraordinary dishonesty or stupidity?

Nimatzo
iChihuaha Mon Aug 27 05:43:23
It is not my fault that the researchers didn't make better calculations. And my bet is that they did, we can speculate over why they didn't present it. They are very open about their own bias and assumptions. Kudos to them.

You chose this study, and presented it as robust evidence for causal mechanism.

I also summarized what you think is important, I actually understand you better than you understand the results or my position.

I gave all the links to previous threads and the study. Full transparency and asked for second opinions.

I gave you a way out, I said if you only read the articles on gizmodo or vice, I forgive you, but you said no no I read the study.

That makes you the dishonest idiot, not me :)

Nimatzo
iChihuaha Mon Aug 27 06:24:31
So lets hear the authors best explanations for their r = 0.02

Discussion
Why do differences exist in acceptance rates?
To summarize this paper’s observations:

Women are more likely to have pull requests accepted than men.

Women continue to have high acceptance rates as they do pull requests on more projects.

Women’s pull requests are less likely to serve an documented project need.

Women’s changes are larger.

Women’s acceptance rates are higher for some programming languages.

Men outsiders’ acceptance rates are higher when they are identifiable as men.

We next consider several alternative theories that may explain these observations as a whole.

Given observations 1–5, one theory is that a bias against men exists, that is, a form of reverse discrimination. However, this theory runs counter to prior work (e.g., Nafus, 2012), as well as observations 6.

Another theory is that women are taking fewer risks than men. This theory is consistent with Byrnes’ meta-analysis of risk-taking studies, which generally find women are more risk-averse than men (Byrnes, Miller & Schafer, 1999). However, this theory is not consistent with observation 4, because women tend to change more lines of code, and changing more lines of code correlates with an increased risk of introducing bugs (Mockus & Weiss, 2000).

Another theory is that women in open source are, on average, more competent than men. In Lemkau’s review of the psychology and sociology literature, she found that women in male-dominated occupations tend to be highly competent (Lemkau, 1979). This theory is consistent with observations 1–5. To be consistent with observations 6, we need to explain why women’s pull request acceptance rate drops when their gender is apparent. An addition to this theory that explains observation 6, and the anecdote described in the introduction, is that discrimination against women does exist in open source.

Assuming this final theory is the best one, why might it be that women are more competent, on average? One explanation is survivorship bias: as women continue their formal and informal education in computer science, the less competent ones may change fields or otherwise drop out. Then, only more competent women remain by the time they begin to contribute to open source. In contrast, less competent men may continue. While women do switch away from STEM majors at a higher rate than men, they also have a lower drop out rate then men (Chen, 2013), so the difference between attrition rates of women and men in college appears small. Another explanation is self-selection bias: the average woman in open source may be better prepared than the average man, which is supported by the finding that women in open source are more likely to hold Master’s and PhD degrees (Arjona-Reina, Robles & Dueas, 2014). Yet another explanation is that women are held to higher performance standards than men, an explanation supported by Gorman & Kmec (2007) analysis of the general workforce, as well as Heilman and colleagues’ (2004) controlled experiments.
------------------

Many theories, not many answers. Welcome to socials science.

So here is my position. STOP MAKING/SUPPORTING POLICY DECISIONS BASED ON SOCIAL SCIENCE STUDIES.

Clear?

And ^that includes Evolutionary Psychology. Keep studying and add more quantitative methods, then maybe in say 20-40 years we will have something useful.

This is your position:

I read something in Gizmodo about feminism. Let's see how this can be a matter of liability for companies and how we can destroy people based on things no one understands and that researchers cautions us from reading too much into, because "muh values".

Seb
Member Mon Aug 27 06:49:49
Nim:

"It is not my fault that the researchers didn't make better calculations."

It is your fault if you take one calculation and knowingly pretend it's a calculation of something different.

It is the height of dishonesty, and a blatant demonstration of either stupidity or bad faith or both.

Seb
Member Mon Aug 27 06:50:52
Now, to the point:

"You chose this study, and presented it as robust evidence for causal mechanism."

It is. I have explained. What this shows is that on GitHub (which is the largest single and most commonly used repository for OS projects), a woman would have to work about 16% as hard to generate the same number of accepted contributions to open source projects as someone who was not identifiable as a woman.

Given the prominence recruiters make of portfolio, even for entry level jobs - and the main way of building up this portfolio is via contributions to OS projects as outsiders - this indicates a significantly higher barrier for female coders attempting to obtain their first job.

You seem not to be following the chain of logic here - which is why I asked you if you understood what systematic bias means.

It doesn't matter precisely what causes this drop - and I agree it is hard to know the precise cause - but it does evidence strongly the fact that differences exist in acceptance ratios based purely on the presentation of the identity of the contributor; and this difference translates into a skew in recruitment.

The precise *cause* of imbalance between neutral and women isn't that important really. We've documented it exists and that it would have a consequence in hiring decisions.

Nimatzo
iChihuaha Mon Aug 27 07:06:45
"It is your fault if you take one calculation and knowingly pretend it's a calculation of something different."

I knowingly said this in the OP, I think it is fairly clear to anyone not named seb.

>>..and a correlation of 0.022. That women on average do better than men in having their requests accepted. YES, the reverse.

He is adamant that this is evidence of patriarchy, because in 1 sub class where requests are from outsiders, women's acceptance falls more than men's, they both fall.<<

You should ask the question, why they didn't calculate the effect size for the sub class you are interested in? With such abysmal effect, if they had something better, you think they wouldn't have published that?

Idiot.

Nimatzo
iChihuaha Mon Aug 27 07:09:43
Also don't waste energy, I stopped being interested and reading all of you posts on this in the first thread. I explicitly asked for second opinions, your opinions have been noted.

Seb
Member Mon Aug 27 08:52:03
Nimatzo:

"I knowingly said this in the OP, I think it is fairly clear to anyone not named seb."

Let me see if I can follow this lunacy.

The entire point of this thread is an accusation that the effect I am citing is weak, based off quoting r=0.02 for an entirely different statistic.

You claim you knew before hand that this correlation coefficient does not relate to the statistic I'm citing.

You also claim that you did not intend to mislead people, and think it is very clear.

But you created this entire thread on the basis that I'm citing an effect with a low r, even though you are clear you know this is not the case as there is no published r for the statistic I published.

And you've repeatedly claimed again and again I cited a statistic with a low r, even though you now retract that claim.

Ok, so you don't actually have any evidence at all that I'm citing a weak effect.

I'm going to have to go with dishonest and stupid. At the very best this is a giant non-sequitur.

"You should ask the question, why they didn't calculate the effect size for the sub class you are interested in?"

Desire to publish - left for future work? Work can go on for ever. But it's pretty likely that the r will be higher as it's essentially the gradient of the best fit line, and one variate is a binary; and there is no way the variance can be greater than the aggregate dataset.

You zoom in on the r=0.02, but the other two sizing mechanisms are telling: the difference in the aggregate indicates gender is around 40% greater than the learning effect from going from 16 to 32 accepted pull requests; and double the effect of doubling the complexity of the request from 10 to 20.

You seem to be assuming a motivation and dishonesty in the researchers.

In any case, you seem to be confusing the size of the effect (which is what r measures) with the degree of certainty that it exists. And as the effect we are interested in is outsiders (early career) and neutral to female - and the causal mechanism that follows from the effect simply existing (rather than the causal mechanisms underlying that effect) - it seems a spectacularly blind ally to go down when trying to compare it to the strength of other effects.

The simple issue can be quantified directly: all other things being equal, women seeking entry level jobs on the strength of an OS portfolio would need to work 16% harder to generate the same portfolio if they were not identified as women.

Asking for second opinions when deliberately mis-characterising my argument and presenting a non-sequitur seems odd to me.

Garbage in, garbage out.

Nimatzo
iChihuaha Mon Aug 27 08:58:32
"Garbage in, garbage out."

Stop stealing my lines.

hood
Member Mon Aug 27 09:03:36
"Stop stealing my lines."

Take it as a point of pride. I have. You'll notice Seb uses "dishonest" a lot these days and didn't only a few months ago. Could almost bring one to tears, seeing them grow up and learn new words right before your eyes.

murder
Member Mon Aug 27 09:16:58

"Stop stealing my lines."

You realize that GIGO predates your birth, right?

"The first use of the term has been dated to a November 10, 1957"

http://en.wikipedia.org/wiki/Garbage_in,_garbage_out

Nimatzo
iChihuaha Mon Aug 27 10:03:42
Yes, but as hood explained and I have noticed this too, as him. Seb is not very good at flaming, or really anything. Specifically "garbage in garbage out" is something I have used to describe social science in general here.

Also how rude of you to assume my age. I identify as 102 years old.

Anyways, look at him post all that nonsense, without saying anything at all. Now I have to explain why he chose this study as his smoking gun. Go read the original thread, how sure he sounded,

seb said:
"I've pointed out concrete studies like the infamous (yes for you) GitHub pull request stats: where identical (no) pull requests are overwhelmingly likely to be accepted for women when requester is gender blind, but where there is a signfiicant sex bias (no) when the sequesters gender can be inferred (for men as well just less)."

Not identical requests, as anyone who have read the study can see:

(from the study)

"Women’s pull requests are less likely to serve an documented project need.

Women’s changes are larger."

Both of these things reduce the likelihood of acceptance.

We can go on, but there is no use.

Seb
Member Mon Aug 27 10:14:35
Hood:

Do you think perhaps I'm using very specifically your words to highlight your enormous hypocrisy?

I was calling Nim intellectually dishonest before the Ansari thread where you and delude started using it. The concept ain't new.

hood
Member Mon Aug 27 10:17:18
You think I started using the term since Ansari? Bro, your mind ain't right.

Seb
Member Mon Aug 27 10:24:15
Nim:

Have you now? It's a common rebuttal to overly complex analysis built on bad data.

You are also not looking at the study I was originally thinking of (which wasn't peer reviewed), that found a bug in a common but unsupported dependency and then submitted pull requests for a patch from two different profiles (like the French CV study demonstrating racial bias via inferred ethnicity of the candidates name) and noted the acceptance rates of each.

I pointed out this wasn't the study I was originally thinking of.

"Both of these things reduce the likelihood of acceptance"
Except the whole point is that the "not identified as women" by their profile are still are women as identified by their g+ account you freaking idiot!

You really don't understand the basics of this paper at all do you.

Seb
Member Mon Aug 27 10:24:15
Nim:

Have you now? It's a common rebuttal to overly complex analysis built on bad data.

You are also not looking at the study I was originally thinking of (which wasn't peer reviewed), that found a bug in a common but unsupported dependency and then submitted pull requests for a patch from two different profiles (like the French CV study demonstrating racial bias via inferred ethnicity of the candidates name) and noted the acceptance rates of each.

I pointed out this wasn't the study I was originally thinking of.

"Both of these things reduce the likelihood of acceptance"
Except the whole point is that the "not identified as women" by their profile are still are women as identified by their g+ account you freaking idiot!

You really don't understand the basics of this paper at all do you.

Seb
Member Mon Aug 27 10:25:22
Or are you once again deliberately raising a non-sequetur?

Nimatzo
iChihuaha Mon Aug 27 10:43:35
Post this non peer reviewed study, I am not one to dismiss unpublished data as Jergul is. They can be very important in specific areas as they can suffer from publication bias.

Just make sure they have calculated the (correct) effect size, otherwise, you can just throw it in the thrash together with this one.

Nimatzo
iChihuaha Mon Aug 27 10:48:59
You are obviously too dumb, but I am just holding you to ONE standard, if you pass the ONE standard then I have no issues agreeing that "this is not good and we should do something about it".

When I am speculating and theorizing on UP about possible evolutionary explanations for behavior, I am aware of what I am doing. I suggest no I forbid anyone to make policy based on my speculations.

When you or other SJWs do the EXACT SAME THING, you think you have discovered som exact science 99% confidence interval, perfect correlation, this is how you talk and behave, everything is obvious and uhm OBVIOUS!!!! You want actions PROMPT. Line the fuckers up and start beheading them as murder would say.

Well hold on there missy, no so fast. How do you know? Uhm bla bla study, ok lets look. Oh it's garbage :/

So when you DO find it seb, given my superior values, of course I think something should be done about it. But standards, standards, my boy.

Seb
Member Mon Aug 27 14:06:54
Nim:

The effect size in this study is clear though. 16% more work required for women to build the same portfolio if they hadn't identified themselves on average.

However, because we are dealing with two dichotomous variables, and because the data is available in the supplementary tables, we can in fact calculate the r for the stat I am interested in. The r is -0.36 (being identified as a woman vs gender neutral is negatively correlated with acceptance).

Now, this r is still pretty meaningless. Plotting a correlation coefficient is pretty meaningless as I've pointed out as there are bound to be other factors that dominate an individuals acceptance which contribute to the variance. These will be hard to disentangle due to the binary nature of the variables (accepted, rejected; identified, unidentified).

You should also know that while the authors have used r here for comparison with other studies, this is not a great way of estimating the significance or power of an effect with this kind of data.

But hey, they you have it.

Seb
Member Mon Aug 27 14:17:09
So, in summary:

Article shows that there is a significant reduction (about 8% points, or an 11% reduction in acceptance rate) in the liklihood of women outsiders having their pull requests accepted if they can be identified as a woman.

This has a relatively strong r, and is very (around 3 and 4 times respectively) large compared to other factors explored that effect PR acceptance rates: experience (i.e. number of prior PRs) and complexity (number of files in PR) - which cause effects of 3% and 2% (percentage points) when doubled over the ranges cited.

The r between identifiability and success is -0.35 if you are a woman, which is a strong effect by social science standards.

It corresponds to women identified as women having to work 16% as hard to develop the same portfolio as if they were not identified as women on average.

Given the importance of a portfolio of OS contributions to succesful job applications in entry level and early-career posts, this is a substantially un-level playing field.

Are you now happy?

Nimatzo
iChihuaha Mon Aug 27 14:27:42
No I am completely uninterested in your calculations.

http://pee...-that/#annotation-2002-replies

Our analysis (not in this paper -- we've cut a lot out to keep it crisp) shows that women are harder on other women than they are on men. Men are harder on other men than they are on women.

How does their analysis fit into your narrative?

jergul
large member Mon Aug 27 14:32:24
What is the N of women judging women?

Seb
Member Mon Aug 27 14:40:32
Nim:

Rewind. You have spent several threads, including this one, stressingbthe importance of r.

It's trivial actually to calculate r from the data they provide.

I've done so, it's actually quite large.

To suddenly say you are not interested in this is absurd and only further proves your dishonesty and stupidity.

Your other point is irrelevant. Its perfectly possible for individual women to share the prejudice women are bad coders and they themselves are exceptional, for example.

But we should not move on when you specifically reared this thread to discuss statistics and the strength of this effect. We owe it to your past self to follow through without fear or favour even if your present self is now keen to avoid an embarrassing fact that runs counter to your narrative.

As Feynman said, the greatest thing in science is a beautiful theory slain by an ugly truth.

Seb
Member Mon Aug 27 14:44:50
Jergul:

That too. But it's actually not relevant. An individual woman's assessment of the bulk characteristic of women as a whole is not necessarily going to be more accurate or unbiased as a man's. Indeed on the inside, there are psychological motivators to adopt the prejudice of the majority, and selection mechanisms that mean women who do join the ingroup are selected for confirming to the ingroups prejudices.

Nims point is wholly illogical and infantile "but she did it too" nonsense that belongs in a playground, not an analysis of systematic biases.

Seb
Member Mon Aug 27 14:46:33
Nim, if you are uninterested in my calculation, perform your own.

It is trivial. It's just the determinate of the 2x2 matrix of n's for the two dichotomies variables.

Nimatzo
iChihuaha Tue Aug 28 03:13:44
So BS vendor seb takes the figure he likes before the section:

"Are acceptance rates different if we control for covariates?"

Well is it? let's look at more "snippet of things" seb doesn't want to read :)

Study:

Programming languages:
"For programming languages, acceptance rates for three (Ruby, Python, and C + +) are significantly higher for women, and one (PHP) is significantly higher for men."

Number of pull requests:
"We perform chi-squared tests and Benjamini–Hochberg corrections here as well. Compared to Fig. 3, most differences between genders diminish to the point of non-statistical significance.
...
Overall, women maintain a significantly higher acceptance rate beyond the first pull request, disconfirming the hypothesis."

[What they don't write but you can see from the data is, "One-Pull requests" from people who only ever submit one pull request is 2% higher for women.]

"We next investigate acceptance rate by gender and perceived gender using matched data. Here we match slightly differently, matching on identifiability (gendered, unknown, or neutral) rather than use of an identicon*.

Why do this?

Comment from author at the questions section.
*A big factor at play here is that 54% women use identicons, whereas only 31% of men do. This can partially explain why women in general have higher acceptance rate.
http://pee...rom-figure-5-seem-inconsistent

This takes place at the loss of fidelity the sample size decreases so they relax criteria for matching, but it just shows the inherent flaws and weakness in the study and the difficulty in the subject matter. The results are however:

"For outsiders, while men and women perform similarly when their genders are neutral, when their genders are apparent, men’s acceptance rate is 1.2% higher than women’s (χ2(df = 1, n = 419,411) = 7, p < .01)."

So after controlling for other variables (they could think of) we have a 1.2% "advantage" for men in the out group when gender is identifiable. And since the claim here is about an "uneven playing field" the interesting effect size is the difference between men and women when they are "noobs".

Now use your calculation skillz for this controlled 1.2% difference. Wait don't bother, a dishonest piece of shit is a dishonest piece of shit.

Nimatzo
iChihuaha Tue Aug 28 03:23:04
In ending, your ability to mimic a calculator likely surpasses mine, but so does your inability to control for your own bias and this staggering level of dishonesty.

See how quickly you question and dismiss the methodology for stereotype accuracy, but completely blind to all the flaws and uncontrolled and unknown factors in this one? One of them happens to support your world view, the the other doesn't.

This was the last time I wasted time discussing content with you.

Nimatzo
iChihuaha Tue Aug 28 03:33:48
jergul
We do not know, they did not release the numbers. It would of course be highly relevant and perhaps even more relevant, how much harder are men on other men relative to women on other women. Other studies show that men are more apathetic towards other men relative to women towards women. Women generally tend to have more, shall we say "opinions" about other women.

The speculative psychological answer is that when men get into other mens "business" there is a far greater likelihood for violence/conflict, so unless there is something to gain they don't. Now this risk obviously does not exist on github, but it would be part of a psychological decision process.

The intra-sexual dynamic is an important factor that is often forgotten.

Seb
Member Tue Aug 28 03:41:27
Nim:

r is the figure *you* like and keep quoting.

You keep returning to the differences between women and men; even though the substantial differences in their coding approach render that too uncontrolled a comparison (which is precisely why I'm not focusing on it).

But this is irrelevant - the figure of relevance is the between women whose github handle identifies them as women, and women whose github handle does not identify them as women.

As I have made abundantly clear,.

So, bluntly, differences between men and women's languages and contributions are not relevant and to keep bringing them up is foolish. It's precisely why I'm not looking at those figures, but the most significant and large effect in the paper.

The figure of issue here is that simply being identified as a woman drops a woman's acceptance rate for outsiders. And this is systemically important as outline above.

Nim, you are just embarrassing yourself now.

The only reason you have wasted time here is because you are pathologically unable to modify your views based on new information.

Seb
Member Tue Aug 28 03:43:11
Nim:

As pointed out in the paper, the dataset is open and they have published their algos. So it not true to say they did not release the numbers.

So you can go and do that work yourself if you are so minded.

You are rather lazy in these matters.

Nimatzo
iChihuaha Tue Aug 28 05:50:12
"You keep returning to the differences between women and men"

Yes strangely when the issue raised is the uneven playing field for women relative to men, we "keep returning" to the actual group level differences between men and women. Then I ask all these irrelevant question like, how big is it, what is the effect size etc.

"So, bluntly, differences between men and women's languages and contributions are not relevant"

Only the things that agree with my world view are relevant. Other factors and analysis that eliminate differences are not relevant. Even when the study shows these things are actually very relevant, they are not relevant.

That's all folks, no surprises, lead the horse to water and you may still end up shooting it in the head. *shrugs*

Seb
Member Tue Aug 28 06:01:22
Nim:

I've explained the uneven playing field.

It arises in the fact that entry level women who identify as women rather than remaining anonymous (and so are less able to verify themselves in job apps) would have to work 16% harder to obtain the same portfolio.

As you keep pointing out, comparing men to women directly doesn't work as the nature of their submissions is different. So The mind boggles why you keep trying to do so, and then attacking it. It's almost like a straw man! No, wait, it's *exactly* like a straw man.

You've asked how big the size of the effect is: it's an 8%-point reduction, or about 11% reduction in acceptance rate; which corresponds to 16% more pull requests being developed in order to get the same number of accepted contributions.

In comparison to other effects studied, it is around 2.5-3 times stronger in magnitude than the effect of doubling experience, and 4 times the magnitude of the effect of doubling complexity. So it is a large effect.

In terms of r, which I maintain is a strange choice, it's a correlation of -0.35 which is large by social science standards.

"Other factors and analysis that eliminate differences are not relevant."
The point is they don't eliminate the differencce, because the difference is between identified and unidentifed women who do not share the differences between men and women.

The whole point of your argument is that men is not a good control for women because the nature of their contributions can be shown to be different. This is why that comparison is flawed and irrelevant.

Hence, I am looking only at the difference between identified women and unidentified women, where differences in contribution are controlled for.

Further, the simple fact of the matter is that *for whatever reason* women who identify as women do seem to suffer a disadvantage compared to women who do not identify as women. That in turn is what drives the un-level playing field in recruitment.

In terms of *why* identified women should be less likely to be accepted - I am open to hearing plausible explanations, and why the effect should be so much more than men vs unidentified men.

Seb
Member Tue Aug 28 06:02:09
In general, you would think there would be *less* trust for unidentified individuals than identified ones. Yet it appears for women, there is a very large penalty in being identified.

Why should that be?

Nimatzo
iChihuaha Tue Aug 28 07:18:25
What a dumbass, you assume that being an insider is the baseline, because people are born as insiders on github and they "drop" in acceptance rate? lol completely fucking ass backwards.

You start as an outsider, than people make choices and decisions, that either makes them an insider or they drop out. What is the attrition rate for men vs women? Relevant to know. If there is any difference, what are the qualitative answers?

You ask about 1/100th of the questions you need to ask, when the results of a finding strokes you balls.

Go and read the study again, put on your big boy glasses and use that critical appraisal you said I don't appreciate. If you still think this study says something meaningful, other than as a springboard for further inquiry, shoot yourself in the head.

Seb
Member Tue Aug 28 08:17:32
We, no.

I'm not using insider data at all. As I've repeatedly explained, the effect I'm looking at is the the difference between outsider women who identify as such, and those that do not.

Repeating myself on this over and over again is getting tiresome.

Nim, when you are in a hole, it is best to stop digging.

Seb
Member Tue Aug 28 08:31:57
Nim, is it possible you have confused the insider/outsider variable with identified/unidentified variable?

I am looking simply at the acceptance rate between outsider/identified/female and outsider/unidentified/female populations.

The reason being as insiders by definition are more likely to be mid career (so GitHub contributions are less important, and the effect I'm pointing to is impact on getting entry level jobs). Further any females with insider status are likely to be after any filter created by stereotyping or bias. Insiders generally will have established trust of the core project team. I've already pointed that out (twice I think) - and you appeared to criticise me then for focusing on a subset.

Now you appear to be chastising me for doing something that not only am I explicitly not doing, but which you previously took me to task for not doing.

It is really starting to look like you are completely unable to follow a simple, logical chain of reasoning. Which is very unusual for you, and not at all a pattern of behaviour that long time posters have observed in you over the last couple of years.

Nimatzo
iChihuaha Tue Aug 28 10:19:00
If you are not talking about insiders ”at all” then 1.2% is the difference between men and women when their gender is identifiable. You will have to deal with that narrative problem on your own.

Seb
Member Tue Aug 28 10:28:31
Nim:

But as you've pointed out, the difference between men and women wouldn't control for other effects like size of change etc.

Hence, as I repeatedly explain, the important and telling result showing the uneven playing field for entry level jobs faced by women is the difference between outsider/identified/female and outsider/unidentified/female.

As I've explained patiently to you so many times.

Seb
Member Tue Aug 28 10:29:19
Are you sure your microdosing hasn't affected your brain function?

Seb
Member Tue Aug 28 10:29:20
Are you sure your microdosing hasn't affected your brain function?

Sam Adams
Member Wed Aug 29 00:27:50
Lol r=0.02 is basically perfectly unrelated. Its actually a little hard to get data that random...

So of course seb tries to stand on it.

Of course.

Seb
Member Wed Aug 29 05:21:07
Sam:

Only that's not the figure I stood on. The figure I stood on has an r of 0.35.

Finally, the point of r in this context is to size, not assign certainty.

cf. for example weak effects like incidence of bowl cancer due to consumption of processed meat. You can get high degrees of certainty (statistical significance) for weak effects (the actual impact of eating lots of processed meat is small increase in risk for the individual) when you have large numbers.

Anyway, in this case, the important thing for you to understand Sam is that Nim admitted earlier on that he was deliberately quoting the wrong figure simply because it was available; even though the means to calculate the correct figure were provided in the paper.

Seb
Member Wed Aug 29 06:26:30
Also, as both variates are dichotomous, r is a really weird way of looking at the correlation.

accepted =0,1
identified = 0,1

You basically have a matrix of n_s for variable a,b

n_00, n_01
n_10, n_11

r in this case boils down to

n_00*n_11 - n_01*n_10

So to get -0.5 correlation between being identified as a women and being accepted would require, for example, an acceptance 25% of identified, female outsiders vs 75% of unidentified, female outsiders.

Which is a huge difference!

tl;dr it's just a fairly obscure way of looking at the difference in probabilities.

Seb
Member Wed Aug 29 06:27:08
super tl;dr - Nim is parroting words again without understanding what they mean.

Seb
Member Wed Aug 29 06:36:15
So let's say there was a "natural" average rate of 75% acceptance rate for women outsiders PRs.

Now let's assume that about 2/3rds of of projects are run by people who will always reject any pull request that they know comes from a woman, but will otherwise behave rationally.

That would then get you to your r of -0.5.

I hope this demonstrates why r is a truly ludicrous way of assessing this.

Seb
Member Wed Aug 29 06:36:18
So let's say there was a "natural" average rate of 75% acceptance rate for women outsiders PRs.

Now let's assume that about 2/3rds of of projects are run by people who will always reject any pull request that they know comes from a woman, but will otherwise behave rationally.

That would then get you to your r of -0.5.

I hope this demonstrates why r is a truly ludicrous way of assessing this.

Seb
Member Wed Aug 29 06:59:23
I.e. it is identical to expressing the difference in average acceptance probability, which is actually more meaningful.

hood
Member Wed Aug 29 08:23:37
http://ars...lts-replicate-its-complicated/

*Whistles innocently*

Seb
Member Wed Aug 29 08:38:37
Hood:

And your point is?

hood
Member Wed Aug 29 09:16:26
I'm providing pertinent supplemental commentary. I would think it worth the consideration if we're being objective.

Nimatzo
iChihuaha Wed Aug 29 09:16:39
Anyways I had a short exchange with one of the authors about the results, in summary, there are many ways to look at the data, they leave it ”up to the reader to find what is meaningful to them”. I think I will leave it at that. They give us free hands to pick our own truth, in other words, inconclusive results.

I of course apply rigor and go by their results (matched) after controlling for other effects that may confound that isn’t gender bias (as they say in the study), seb chose the biggest number he could find which is before said controls.

Now if you understand anything about behavioral studies (or any study on humans) you also understand confounding factors and why it is important to control for them. Anything else will produce fundamentally flawed results that can not be trusted, but that can readily be used as ammo for the click bait factories.

On the matter of who is applying scientific rigor and thinking consistently, well the readers may decide that for themselves as well.

Sam Adams
Member Wed Aug 29 10:22:27
"Only that's not the figure I stood on. The figure I stood on has an r of 0.35."

So the most seb friendly subset of the data is pretty uncorrelated.

I am unimpressed.

Seb
Member Wed Aug 29 11:10:15
Hood:

Explain how you think this is pertinent and how we should reconsider these results in light of it.

Nim:
Retard, the fact they don't chose to draw further conclusions doesn't mean no further conclusions can be drawn or that any conclusion drawn is equally valid.

The overall result is uncontrolled, hence the low r. This is clear from the paper. Hence the importance of the result I'm quoting that is controlled.

It's abundantly clear you don't know what you are talking about.

Sam:

See above to understand why this is not like fitting to continuous data. When each data point can fall in only one of four possible points, all you are looking at is the 8% delta.

A correlation of -0.5 (what you'd use for continuous data) would mean women were two thirds less likely to be accepted if they disclosed their gender.

Seb
Member Wed Aug 29 11:12:55
Which in turn could be explained in a model where say 60% of reviewers rejected any pr they knew to be coming from a woman where they would otherwise have a 75% probability of accepting it.

That's an enormous bias but leads to a "marginal" r of -0.5.

Sam Adams
Member Wed Aug 29 14:07:24
Seb trying to think up a bunch of excuses why r=.35 is more robust than it is... has a history of trying to come up with ways to explain away the robustness of crime rate and black population (r=0.9).

Hmmm

Seb
Member Wed Aug 29 17:37:53
Sam:

I just understands what r actually means for dichotomous variables and how degrees of freedom work.

A better way of doing this would be to use a bootstrap resampling method. That would give you the uncertainty on the delta.

As it is r here is just the delta of the means which tells you nothing.

Sam Adams
Member Wed Aug 29 17:52:47
Rofl.

jergul
large member Thu Aug 30 04:56:46
"the fact they don't chose to draw further conclusions doesn't mean no further conclusions can be drawn or that any conclusion drawn is equally valid"

This is sort of the key to understanding the value of research.

Data becomes public domain. The original researchers have first dibs at interpretation, then it is open hunting season.

This is I sometimes read the paper. Because data is one thing and interpretation something else entirely.

Seb
Member Thu Aug 30 05:14:34
Sam:

Yes, well we already knew you have a singular approach to statistics: samistics.

Sam Adams
Member Thu Aug 30 10:53:34
Seb stats: r=0.35 is meaningful, r=0.9 is not.

jergul
large member Thu Aug 30 12:02:37
Sammy
That is not stats. That is interpretation.

For example, what is r equal when other factors than race are accounted for?

Sam Adams
Member Thu Aug 30 12:07:02
"That is shitty interpretation."

Fixed it for you.

Seb
Member Thu Aug 30 12:15:37
Sam:

Depends on your covariance.

If A causes B and C, B and C will appear correlated in a sample where A is uncontrolled.

In this case however, we've controlled for most variables.

And because we have only two variables, both of which are dichotomous, r is just a simple function of the line of best fits gradient.

So the effect of say, having 10% , 20% etc. of code reviewers being prejudiced changes the value of r.

(I.e. if we started constructing dummy data where we actually *know* the causal mechanisms, we discover r tells us nothing about signal to noise because essentially you are fitting a line between four points).

Hence, if the goal is to understand the likelihood the effect is real, use a resampling method like bootstrap to generate n randomly chosen data sets from the data you have. The variance in the mean differences calculated corresponds to the confidence interval you have for the true mean of the distribution.

If you have continuous data, then yes, Pearson's is much more useful

Seb
Member Thu Aug 30 12:15:37
Sam:

Depends on your covariance.

If A causes B and C, B and C will appear correlated in a sample where A is uncontrolled.

In this case however, we've controlled for most variables.

And because we have only two variables, both of which are dichotomous, r is just a simple function of the line of best fits gradient.

So the effect of say, having 10% , 20% etc. of code reviewers being prejudiced changes the value of r.

(I.e. if we started constructing dummy data where we actually *know* the causal mechanisms, we discover r tells us nothing about signal to noise because essentially you are fitting a line between four points).

Hence, if the goal is to understand the likelihood the effect is real, use a resampling method like bootstrap to generate n randomly chosen data sets from the data you have. The variance in the mean differences calculated corresponds to the confidence interval you have for the true mean of the distribution.

If you have continuous data, then yes, Pearson's is much more useful

Sam Adams
Member Thu Aug 30 12:40:42
I see what you are saying, or trying to say. But you have argued time and time again that certain groups dont even commit those crimes, regardless of reason, despite an r of 0.9.

Which of course is wrong.

jergul
large member Thu Aug 30 13:46:50
Sammy
What is a stronger predicator? Race or socio-economic class?

Or to put it another way. Who is more likely to be convicted of a felony; an upper class black woman, or a white trash white guy?

(note the correct use of the semi-colon).

Sam Adams
Member Thu Aug 30 14:05:08
"Race or socio-economic class?"

Race is, though both are strong predictors.

Cherub Cow
Member Thu Aug 30 14:50:37
"(note the correct use of the semi-colon)"

Was this some sort of inside joke where that was known to not actually be a correct usage? ... because that was *not* a correct usage. Firstly, it should be "an upper class black woman or a white trash white guy?" — no comma before "or". That said, "an upper class black woman or a white trash white guy?" is not an independent clause, and semi-colons link independent clauses, so you would *not* use a semi-colon. You may be thinking of semi-colon usage for lists which use commas (though this has its own requirements), so maybe the mistaken addition of a comma caused the misuse of a semi-colon here. Whatever the source of the mistake, you would use a colon for this sentence because a colon goes before the introduction of a list. The corrected sentence:
"Who is more likely to be convicted of a felony: an upper class black woman or a white trash white guy?"

jergul
large member Thu Aug 30 15:57:01
CC
First off, if you use a colon, then you capitalize: Who is more likely to be convicted of a felony: An upper class...

Secondly, we shall agree to disagree.

Thirdly, I felt your plan not to engage with me was a good one. I find detached kewl-kid cycism to be endlessly tiresome.

jergul
large member Thu Aug 30 15:59:56
Sammy
We could go on from there to find a reasonable interpretation that would render r quite meaningless.

hood
Member Thu Aug 30 16:10:51
Unless you're speaking in some weird non-english-that-uses-english-words language, your use of a semi-colon is inaccurate. There's no agreement to disagree; you're just wrong.

note the correct use of a semi-colon.

jergul
large member Thu Aug 30 16:23:16
Hood
We too shall agree to disagree. But feel free to get back to me when you get a clue.

Sam Adams
Member Thu Aug 30 17:32:19
"We could go on from there to find a reasonable interpretation that would render r quite meaningless."

Lol jergul and seb are the axis of wrong

jergul
large member Thu Aug 30 17:48:38
Sammy
If we remove "poor" and "male", then what would r look like?

Sam Adams
Member Thu Aug 30 17:56:13
There would still be a good correlation between african women and crime, relative to white and asian women

Next.

jergul
large member Thu Aug 30 18:24:58
For whatever trivial single digit percentage of all crimes that might be.

Wrath of Orion
Member Thu Aug 30 22:02:12
"First off, if you use a colon, then you capitalize: Who is more likely to be convicted of a felony: An upper class..."

No, you do not capitalize if the clause following a colon is dependent. If the clause is independent, some will capitalize and others will not.

Cherub Cow
Member Thu Aug 30 23:01:24
“There's no agreement to disagree; you're just wrong.”

+1
Jergul is incorrect in this case. This is English grammar, not “I want to be correct and have declared it so, therefore it is so” grammar (like poem grammar maybe?). I’d be happy to post sources and wait until the sun dies for Jergul to post any. Jergul’s kewl kid cynicism must be creating some anti-intellectual angst or something (given that “kewl kid cynicism” apparently has zero meaning).

Seb
Member Fri Aug 31 01:34:48
Sam:

You failed to control for other effects, and lost the plot completely when confronted with cities *you* thought were demographically comparable but had vastly different crime rated - highly suggestive that other factors were at play.

Seb
Member Fri Aug 31 01:35:55
Jergul:
"Thirdly, I felt your plan not to engage with me was a good one. I find detached kewl-kid cycism to be endlessly tiresome."

+1 deer

jergul
large member Fri Aug 31 02:57:47
CC
We shall continue to agree to disagree. Aware as we both are; semicolons have many functions.

And engaging by proxy just makes you seem a drama king.

GG.

jergul
large member Fri Aug 31 03:03:00
WoO
The dependency actually relates to the semicolon disuse. If you do not feel it right to capitalize, then you should probably be using a semicolon, not a colon. Its mostly an academic thing; where semicolons matter. Grammar schoole teachers don't make the grade. So to speak.

jergul
large member Fri Aug 31 03:04:37
My point was not that I think you are a grade school teacher, but rather that most people learn their punctuation at that level. Those who can; do. Those who can't; teach.

Brainy UPer
Member Fri Aug 31 05:02:31
"First off, if you use a colon, then you capitalize: Who is more likely to be convicted of a felony: An upper class..."

No.

"We covered many of the fundamentals in our writing class: grammar, punctuation, style, and voice."

jergul
large member Fri Aug 31 05:11:05
"We covered many of the fundamentals in our writing class; grammar, punctuation, style, and voice."

Fixed!

jergul
large member Fri Aug 31 05:11:44
Alternatively:

"We covered many of the fundamentals in our writing class: Grammar, punctuation, style, and voice."

Brainy UPer
Member Fri Aug 31 05:15:23
Wrong.

http://wri...semi-colons-colons-and-dashes/

jergul
large member Fri Aug 31 05:22:33
We shall agree to disagree!

jergul
large member Fri Aug 31 05:24:35
And University of North Carolina?? Really? Appropriate perhaps for banjo making classes.

Brainy UPer
Member Fri Aug 31 05:44:20
Ah, so that is it. Still be corrected, continue to deny it, and attack the institution.

Nimatzo
iChihuaha Fri Aug 31 05:54:58
Seb
Ok let’s do this slowly.

Figure 11, B (Gender neutral). Are the differences in that column, statistically significant? No, as you can read in the data and the text below it isn’t, p>0.05

Figure 11, B (gendered). Are the differences in that column statistically significant? Yes. And it amounts to a gigantic 1.2%.

So that means delta for men is 6.8%? Possibly. We can’t say that it isn’t, but we can say that anyone who uses these numbers as evidence for something, is a fucking idiot.

Or simply as I said, 1.2% is the only number that is ”useful” here as it is the only significant difference. But ultimately meaningless since the effect is close to zero.

Let’s look at this way, the total amount of mergers in B (neutral) is about 4500 vs the 418 000 in B (gendered) that is roughly 0.01% of the data in B or 0.002% of the 2.5 million this study deals with (after they discard 55% of the data to begin with).

So we are looking at a fraction of a fraction of a fraction of github now. This is actually getting close to having anecdotal evidence as the standard of evidence.

Did you read me say that after the controls, the results suffered in fidelity? Yes they took a toll.

I told you from the very start, this study doesn’t say anything useful for your crusade against men. This was obvious from the fact that the authors published r =0.022 that while ridiculously low and prior to analysis and controls is actually the largest effect size that is reliable _statistically_.

TL:DR seb has found his answer between statistically insignificant numbers and the _statistically_ significant 1.2% difference.

Grats seb, you are even more retarded than anyone gave you credit for.

————
This article is supplementary to what Hood posted, it provides some some detail about the shenanigans found is social science, and in this study.

http://www...ology-studies-are-weak/568630/

jergul
large member Fri Aug 31 07:06:33
UPer
Nothing wrong with banjo work classes.

Brainy UPer
Member Fri Aug 31 07:21:31
Jergul,

Nothing wrong with having humility and being corrected.

jergul
large member Fri Aug 31 07:25:58
Brainy UPer
Does the University of North Carolina teach classes in that too?

Brainy UPer
Member Fri Aug 31 07:29:46
Perhaps you should explore what courses are offered. This may assist you. You're welcome.

jergul
large member Fri Aug 31 07:42:44
http://www.youtube.com/watch?v=gsC4kf6x_Q0

Done!

Cherub Cow
Member Fri Aug 31 08:34:48
Oh, I see. Jergul is just employing some kewl kid cynicism to troll. No one is actually stupid enough to see evidence yet still think that that semi-colon misuse is grammatically correct; Jergul is just having a bit of fun by stating something completely and verifiably incorrect and getting people to waste energy correcting it, all while continuing to make further grammatical mistakes such as the incorrect “it’s” form, a missing dash in “banjo-making”, and more incorrect semi-colon usages. That’s a type of Sebbish dishonesty that warrants further avoidance.

Seb
Member Fri Aug 31 09:54:46
Nim, if you could love straw men anymore than you do, your dick would have splinters.

As figure 11 shows (using matched data), the drop between identified/unidentified male outsiders is quite small.

To remind you, you've already pointed out yourself you can't compare men to women directly because their PRs differ statistically.

The only thing that really tells us about potential biases in acceptance is the change in acceptance rate when the gender is known or unknown.

And it's much larger for women than men.

Nimatzo
iChihuaha Fri Aug 31 10:08:27
Seb the insiginicant will never give up. Grasping after straws is his game.

hood
Member Fri Aug 31 10:23:23
"Nim, if you could love straw men anymore than you do, your dick would have splinters."

I know we make fun of you for being completely inept at the insult game, but credit where credit is do. 5/10, marginally clever and original.

show deleted posts