Thursday, December 28, 2023

40% of US electricity is now emissions-free

Image of electric power lines with a power plant cooling tower in the background.

Just before the holiday break, the US Energy Information Agency released data on the country's electrical generation. Because of delays in reporting, the monthly data runs through October, so it doesn't provide a complete picture of the changes we've seen in 2023. But some of the trends now seem locked in for the year: wind and solar are likely to be in a dead heat with coal, and all carbon-emissions-free sources combined will account for roughly 40 percent of US electricity production.

Tracking trends

Having data through October necessarily provides an incomplete picture of 2023. There are several factors that can cause the later months of the year to differ from the earlier ones. Some forms of generation are seasonal—notably solar, which has its highest production over the summer months. Weather can also play a role, as unusually high demand for heating in the winter months could potentially require that older fossil fuel plants be brought online. It also influences production from hydroelectric plants, creating lots of year-to-year variation.

Finally, everything's taking place against a backdrop of booming construction of solar and natural gas. So, it's entirely possible that we will have built enough new solar over the course of the year to offset the seasonal decline at the end of the year.

Let's look at the year-to-date data to get a sense of the trends and where things stand. We'll then check the monthly data for October to see if any of those trends show indications of reversing.

The most important takeaway is that energy use is largely flat. Overall electricity production year-to-date is down by just over one percent from 2022, though demand was higher this October compared to last year. This is in keeping with a general trend of flat-to-declining electricity use as greater efficiency is offsetting factors like population growth and expanding electrification.

That's important because it means that any newly added capacity will displace the use of existing facilities. And, at the moment, that displacement is happening to coal.

Can’t hide the decline

At this point last year, coal had produced nearly 20 percent of the electricity in the US. This year, it's down to 16.2 percent, and only accounts for 15.5 percent of October's production. Wind and solar combined are presently at 16 percent of year-to-date production, meaning they're likely to be in a dead heat with coal this year and easily surpass it next year.

Year-to-date, wind is largely unchanged since 2022, accounting for about 10 percent of total generation, and it's up to over 11 percent in the October data, so that's unlikely to change much by the end of the year. Solar has seen a significant change, going from five to six percent of the total electricity production (this figure includes both utility-scale generation and the EIA's estimate of residential production). And it's largely unchanged in October alone, suggesting that new construction is offsetting some of the seasonal decline.

Coal is being squeezed out by natural gas, with an assist from renewables.

Enlarge / Coal is being squeezed out by natural gas, with an assist from renewables.

Eric Bangeman/Ars Technica

Hydroelectric production has dropped by about six percent since last year, causing it to slip from 6.1 percent to 5.8 percent of the total production. Depending on the next couple of months, that may allow solar to pass hydro on the list of renewables.

Combined, the three major renewables account for about 22 percent of year-to-date electricity generation, up about 0.5 percent since last year. They're up by even more in the October data, placing them well ahead of both nuclear and coal.

Nuclear itself is largely unchanged, allowing it to pass coal thanks to the latter's decline. Its output has been boosted by a new, 1.1 Gigawatt reactor that come online this year (a second at the same site, Vogtle in Georgia, is set to start commercial production at any moment). But that's likely to be the end of new nuclear capacity for this decade; the challenge will be keeping existing plants open despite their age and high costs.

If we combine nuclear and renewables under the umbrella of carbon-free generation, then that's up by nearly 1 percent since 2022 and is likely to surpass 40 percent for the first time.

The only thing that's keeping carbon-free power from growing faster is natural gas, which is the fastest-growing source of generation at the moment, going from 40 percent of the year-to-date total in 2022 to 43.3 percent this year. (It's actually slightly below that level in the October data.) The explosive growth of natural gas in the US has been a big environmental win, since it creates the least particulate pollution of all the fossil fuels, as well as the lowest carbon emissions per unit of electricity. But its use is going to need to start dropping soon if the US is to meet its climate goals, so it will be critical to see whether its growth flat lines over the next few years.

Outside of natural gas, however, all the trends in US generation are good, especially considering that the rise of renewable production would have seemed like an impossibility a decade ago. Unfortunately, the pace is currently too slow for the US to have a net-zero electric grid by the end of the decade.



from Hacker News https://ift.tt/ws1d056

Linux is the only OS to support diagonal PC monitor mode

Here's a fun tidbit — Linux is the only OS to support a diagonal monitor mode, which you can customize to any tilt of your liking. Latching onto this possibility, a Linux developer who grew dissatisfied with the extreme choices offered by the cultural norms of landscape or portrait monitor usage is championing diagonal mode computing. Melbourne-based xssfox asserts that the “perfect rotation” for software development is 22° (h/t Daniel Feldman).

Many PC enthusiasts have strong preferences for monitor setups. Some prefer ultrawides and curved screens, and others seek out squarer aspect ratios with flat screens. Multiple monitors are popular among power users, too. But what if you have an ultrawide and find the landscape or portrait choices too extreme? Xssfox was in this very situation and decided to use her nicely adjustable stand and the Linux xrandr (x resize and rotate) tool to try and find the ultimate screen rotation angle for software development purposes, which you can see if you expand the below tweet.

Xssfox devised a consistent method to appraise various screen rotations, working through the staid old landscape and portrait modes, before deploying xrandr to test rotations like the slightly skewed 1° and an indecisive 45°. These produced mixed results of questionable benefits, so the search for the Goldilocks solution continued.

It turns out that a 22° tilt to the left (expand tweet above to see) was the sweet spot for xssfox. This rotation delivered the best working screen space on what looks like a 32:9 aspect ratio monitor from Dell. “So this here, I think, is the best monitor orientation for software development,” the developer commented. “It provides the longest line lengths and no longer need to worry about that pesky 80-column limit.”

If you have a monitor with the same aspect ratio, the 22° angle might work well for you, too. However, people with other non-conventional monitor rotation needs can use xssfox’s javascript calculator to generate the xrandr command for given inputs. People who own the almost perfectly square LG DualUp 28MQ780 might be tempted to try ‘diamond mode,’ for example.

We note that Windows users with AMD and Nvidia drivers are currently shackled to applying screen rotations using 90° steps. MacOS users apparently face the same restrictions.



from Hacker News https://ift.tt/rkovDj1

Autorize – Authorization enforcement detection extension for Burp Suite

Autorize

Autorize is an automatic authorization enforcement detection extension for Burp Suite. It was written in Python by Barak Tawily, an application security expert. Autorize was designed to help security testers by performing automatic authorization tests. With the last release now Autorize also perform automatic authentication tests.

alt tag

Installation

  1. Download Burp Suite (obviously): http://portswigger.net/burp/download.html
  2. Download Jython standalone JAR: http://www.jython.org/download.html
  3. Open burp -> Extender -> Options -> Python Environment -> Select File -> Choose the Jython standalone JAR
  4. Install Autorize from the BApp Store or follow these steps:
  5. Download Autorize source code: git clone git@github.com:Quitten/Autorize.git
  6. Open Burp -> Extender -> Extensions -> Add -> Choose Autorize.py file.
  7. See the Autorize tab and enjoy automatic authorization detection :)

User Guide - How to use?

  1. After installation, the Autorize tab will be added to Burp.
  2. Open the configuration tab (Autorize -> Configuration).
  3. Get your low-privileged user authorization token header (Cookie / Authorization) and copy it into the textbox containing the text "Insert injected header here". Note: Headers inserted here will be replaced if present or added if not.
  4. Uncheck "Check unauthenticated" if the authentication test is not required (request without any cookies, to check for authentication enforcement in addition to authorization enforcement with the cookies of low-privileged user)
  5. Check "Intercept requests from Repeater" to also intercept the requests that are sent through the Repeater.
  6. Click on "Intercept is off" to start intercepting the traffic in order to allow Autorize to check for authorization enforcement.
  7. Open a browser and configure the proxy settings so the traffic will be passed to Burp.
  8. Browse to the application you want to test with a high privileged user.
  9. The Autorize table will show you the request's URL and enforcement status.
  10. It is possible to click on a specific URL and see the original/modified/unauthenticated request/response in order to investigate the differences.

Authorization Enforcement Status

There are 3 enforcement statuses:

  1. Bypassed! - Red color

  2. Enforced! - Green color

  3. Is enforced??? (please configure enforcement detector) - Yellow color

The first 2 statuses are clear, so I won't elaborate on them.

The 3rd status means that Autorize cannot determine if authorization is enforced or not, and so Autorize will ask you to configure a filter in the enforcement detector tabs. There are two different enforcement detector tabs, one for the detection of the enforcement of low-privileged requests and one for the detection of the enforcement of unauthenticated requests.

The enforcement detector filters will allow Autorize to detect authentication and authorization enforcement in the response of the server by content length or string (literal string or regex) in the message body, headers or in the full request.

For example, if there is a request enforcement status that is detected as "Authorization enforced??? (please configure enforcement detector)" it is possible to investigate the modified/original/unauthenticated response and see that the modified response body includes the string "You are not authorized to perform action", so you can add a filter with the fingerprint value "You are not authorized to perform action", so Autorize will look for this fingerprint and will automatically detect that authorization is enforced. It is possible to do the same by defining content-length filter or fingerprint in headers.

Interception Filters

The interception filter allows you configure what domains you want to be intercepted by Autorize plugin, you can determine by blacklist/whitelist/regex or items in Burp's scope in order to avoid unnecessary domains to be intercepted by Autorize and work more organized.

Example of interception filters (Note that there is default filter to avoid scripts and images): alt tag

Authors



from Hacker News https://ift.tt/wTn2iDq

Wednesday, December 27, 2023

Suggestions: A simple human-readable format for suggesting changes to text files

Motivation

Many word processors have built-in change management. Authors can suggest changes and add comments, then an editor can accept or reject them.

Word screenshot

People who write documents using text-file-based formats like TeX or markdown have a problem: text files don’t have a concept of changes. This makes it harder to collaborate in teams. To get change management, they can:

  • Use an online editor, losing the flexibility of simple text files;
  • Use a version control system like git, which is complex and technical.

Suggestions files are a standard for changes for plain text. They let authors collaborate, suggest and review changes. They don’t require any special software, and they can be used on any kind of text file. You just edit the file as usual, and follow some simple rules.

File format

Making suggestions

To suggest new text to add to a file, enclose it in ++[ and ]++ tags like this:

The original text, ++[your addition,]++ 
and more text.

To suggest a deletion from a file, enclose it in --[ and ]-- tags like this:

The original text, --[text to delete,]-- 
and more text.

To make a comment, enclose it in %%[ and ]%%:

%%[Is this clearer? @stephen]%%

You can sign the comment with a @handle as the last word.

Reviewing suggestions

To review suggestions:

  • To accept a suggested addition, delete the ++[ and matching ]++, leaving everything between them.
  • To accept a suggested deletion, delete everything between --[ and ]-- inclusive.

Rejecting suggestions is just the other way round:

  • To reject an addition, delete everything between ++[ and ]++ inclusive.
  • To reject a deletion, delete the --[ and matching ]--.

You can also delete comments. Typically, you will have to do this before using the text file for another purpose.

If a tag (++[, ]++, --[, ]--, %%[ or ]%%) is on its own on a line, treat the subsequent newline as part of the tag and delete it:

A paragraph of text.
++[
A new line.
]++
The paragraph continues.

becomes

A paragraph of text.
A new line.
The paragraph continues.

if the addition is accepted, or

A paragraph of text.
The paragraph continues.

if it is rejected.

Multiple authors and nested suggestions

If multiple authors are working on a document, you may want to sign your suggested changes. Do that by putting your handle at the end of the change, just like for a comment. The handle must start with @ and must be the last word:

And God said, 
%%[first try! @wycliffe]%%
--[Light be made, 
and the light was made. @tyndale]-- 
++[Let there be lyghte 
and there was lyghte. @tyndale]++
++[Let there be light: 
and there was light. @kjv]++

You can nest suggestions within each other:

Last night I dreamt I went to Manderley
++[, the famous ++[Cornish @editor]++ 
seaside resort, @daphne ]++ again.

You can’t nest changes within comments (it would be too confusing). If you want to add to a comment, just write inside it with your handle. It’s only a comment anyway.

The rules for reviewing nested comments are the same as above. You may need to adjudicate between different alternatives. Obviously, if you accept someone’s deletion, any other suggestions inside it will be deleted and be irrelevant.

There is a command line tool suggs for working with suggestions files.

The purpose of suggs is to let you automate parts of the editing process. For example, you can edit a file, save a new version, then use suggs to create a suggestions file. Or you can take someone else’s suggestions file and quickly accept or reject all the changes. Lastly, suggs can display suggested changes in extra-readable formats, like colorized text or TeX.

Download it here:

Or get the source on github.

Usage

Print a suggestions file with additions, deletions and comments shown in color:

suggs colorize file.txt

Print file.txt with all suggestions accepted:

suggs new file.txt

Print file.txt with all suggestions rejected:

suggs old file.txt

Accept or reject all changes in-place, writing the result back to file.txt:

suggs accept file.txt
suggs reject file.txt

Create a suggestions file from the difference between old.txt and new.txt:

suggs diff old.txt new.txt

Print file.txt with changes highlighted as a TeX file:

suggs tex file.txt

Why not just use a diff file?

diff is a command that prints the difference between two text files. It’s widely used in the computing world. But diffs are designed for computers and code, not humans and text:

  • Diff output makes no sense without the original file. You can’t read changes in their original context. A suggestions file shows additions and deletions in context; it can be sent as an email attachment, read and understood.
  • Using and applying diffs requires command line tools. This is hard for non-technical authors. Suggestions files don’t require any command line tools, but you can use one if you like.
  • Diffs are typically line oriented. This makes them hard to read when only a word or phrase has changed.
  • You can’t put comments and authorship in a diff file.
  • A diff file only shows one set of changes. A suggestions file can show changes by multiple authors, including nested changes.

If you have a comment or suggestion, file an issue.

TeX tip

If you write comments like

%%[
% My comment here.
% ]%%

then TeX will also treat them as comments.



from Hacker News https://suggestions.ink

Tuesday, December 26, 2023

Clanging

Clanging (or clang associations) is a symptom of mental disorders, primarily found in patients with schizophrenia and bipolar disorder.[1] This symptom is also referred to as association chaining, and sometimes, glossomania.

Steuber defines it as "repeating chains of words that are associated semantically or phonetically with no relevant context".[2] This may include compulsive rhyming or alliteration without apparent logical connection between words.

Clanging refers specifically to behavior that is situationally inappropriate. While a poet rhyming is not evidence of mental illness, disorganized speech that impedes the patient's ability to communicate is a disorder in itself, often seen in schizophrenia.[3]

Example[edit]

This can be seen by a section of a 1974 transcript of a patient with schizophrenia:

We are all felines. Siamese cat balls. They stand out. I had a cat, a manx, still around here somewhere. You’ll know him when you see him. His name is GI Joe; he’s black and white. I have a goldfish too, like a clown. Happy Halloween down. Down.[4]

The speaker makes semantic chain associations on the topic of cats, to the colour of her cat, which (either the topic of colours/patterns, or the topic of pets) leads her to jump from her goldfish to the associated clown, a point she gets to via the word clownfish. The patient also exhibits a pattern of rhyming and associative clanging: clown to Halloween (presumably an associative clang) to down.

This example highlights how the speaker is distracted by the sound or meaning of their own words, and leads themselves off the topic, sentence by sentence. In essence, it is a form of derailment driven by self-monitoring.[5]

As a type of Formal Thought Disorder[edit]

Formal Thought Disorders (FTD) are a syndrome with several different symptoms, leading to thought, language and communication problems, being a core feature in schizophrenia.[6]

Thought disorders are measured using the Thought, Language and Communication Scale (TLC) developed by Andreasen in 1986.[6] This measures tendencies of 18 subtypes of formal thought disorder (with strong inter-coder reliability) including clanging as a type of FTD.

The TLC scale for FTD sub-types, remains the standard and most inclusive - so clanging is officially recognised as a type of FTD.[2]

There has been much debate about whether FTDs are a symptom of thought or language, yet the basis for FTD analysis is the verbal behaviour of the patients. As a result, whether abnormal speech among individuals with schizophrenia is a result of abnormal neurology, abnormal thought or linguistic processes - researchers do agree that people with schizophrenia do have abnormal language.[2]

Occurrences in mental disorders[edit]

Clanging is associated with the irregular thinking apparent in psychotic mental illnesses (e.g. mania and schizophrenia).[7]

In schizophrenia[edit]

Formal Thought Disorders are one of five characteristic symptoms of schizophrenia according to the DSM-IV-TR.[1] FTD symptoms such as Glossomania is correlated to schizophrenia spectrum disorders, and to a family history of schizophrenia.[1] In an analysis of speech in patients with schizophrenia compared to controls, Steuber found that glossomania (association chaining) is a characteristic of speech in the schizophrenic patients - despite no significant difference between normal controls and individuals with schizophrenia.[2]

In mania/bipolar disorder[edit]

Gustav Aschaffenburg found that manic individuals generated these "clang-associations" roughly 10–50 times more than non-manic individuals.[8] Aschaffenburg also found that the frequency of these associations increased for all individuals as they became more fatigued.[9]

Andreasen found that when comparing Formal Thought Disorder symptoms between people with schizophrenia and people with Mania, that there was greater reported incidence of clang associations of people with mania.[6]

In depression[edit]

Research investigated by Steuber, found there was no significant difference of glossomania occurrences for patients with schizophrenia compared to patients with depression.[2]

Disagreements in the literature[edit]

Being a niche area of symptoms of mental disorders, there have been disagreements in the definitions of clanging, and how it may nor may not fall under the subset of Formal Thought Disorder symptoms in schizophrenia. Steuber argues that although it is a FTD, that it should come under the umbrella of the subtype 'distractibility'.[2]

Moreover, due to limited research there have been discrepancies in the definition of clanging used: an alternative definition for clanging is: “word selection based on phonemic relatedness, rather than semantic meaning; frequently manifest as rhyming”. Here it is evident that the semantic association chains are not included as part of the definition seen at the start[2] – even though it is the more widely used definition of clanging and glossomania (where the terms are used interchangeably).

Biological factors[edit]

Understanding of such language impairments and FTDs take a biological approach.

Candidate genes for such vulnerability of schizophrenia are the FOXP2 (which is linked to a familial language disorder and autism) and dysbindin 1 genes43,44.[1] This distal explanation not only does not explain clanging specifically, but also fails to include other environmental influences on the development of schizophrenia. Moreover, if a person does develop schizophrenia, it does not guarantee they have the symptom of clanging.

Sass and Pienkos 2013 suggest that a more nuanced understanding of structural (neural changes) patterns that occur in a sufferer's brain may to understand the disorder firstly.[10] However, more research is required into not only understanding the causes of such symptoms, but how it works.

See also[edit]

References[edit]

  1. ^ a b c d Radanovic, Marcia; Sousa, Rafael T. de; Valiengo, L.; Gattaz, Wagner Farid; Forlenza, Orestes Vicente (18 December 2012). "Formal Thought Disorder and language impairment in schizophrenia". Arquivos de Neuro-Psiquiatria. 71 (1): 55–60. doi:10.1590/S0004-282X2012005000015. PMID 23249974.
  2. ^ a b c d e f g Steuber 2011, p. .
  3. ^ Covington, Michael A.; He, Congzhou; Brown, Cati; Naçi, Lorina; McClain, Jonathan T.; Fjordbak, Bess Sirmon; Semple, James; Brown, John (September 2005). "Schizophrenia and the structure of language: The linguist's view". Schizophrenia Research. 77 (1): 85–98. doi:10.1016/j.schres.2005.01.016. PMID 16005388. S2CID 7206375.
  4. ^ Chaika, Elaine (July 1974). "A linguist looks at 'schizophrenic' language". Brain and Language. 1 (3): 257–276. doi:10.1016/0093-934X(74)90040-6.
  5. ^ Covington, Michael A.; He, Congzhou; Brown, Cati; Naçi, Lorina; McClain, Jonathan T.; Fjordbak, Bess Sirmon; Semple, James; Brown, John (September 2005). "Schizophrenia and the structure of language: The linguist's view". Schizophrenia Research. 77 (1): 85–98. doi:10.1016/j.schres.2005.01.016. PMID 16005388. S2CID 7206375.
  6. ^ a b c Andreasen, Nancy C.; Grove, William M. (1986). "Thought, language, and communication in schizophrenia: diagnosis and prognosis". Schizophrenia Bulletin. 12 (3): 348–359. doi:10.1093/schbul/12.3.348. PMID 3764356.
  7. ^ Peralta, Victor; Cuesta, Manuel J.; de Leon, Jose (March 1992). "Formal thought disorder in schizophrenia: A factor analytic study". Comprehensive Psychiatry. 33 (2): 105–110. doi:10.1016/0010-440X(92)90005-B. PMID 1544294.
  8. ^ Kraepelin, Emil (1921). Manic-depressive insanity and paranoia. Edinburgh: E. & S. Livingstone. p. 32. ISBN 978-0-405-07441-7. OCLC 1027792347.
  9. ^ Spitzer, Manfred (1999). "Semantic Networks". The Mind within the Net. doi:10.7551/mitpress/4632.003.0015. ISBN 978-0-262-28416-5. S2CID 242159639.
  10. ^ Sass, Louis; Pienkos, Elizabeth (September 2015). "Beyond words: linguistic experience in melancholia, mania, and schizophrenia". Phenomenology and the Cognitive Sciences. 14 (3): 475–495. doi:10.1007/s11097-013-9340-0. S2CID 254947008.

Sources[edit]



from Hacker News https://ift.tt/QATsD1C

Nintendo Switch's iGPU: Maxwell Nerfed Edition

Graphics performance is vital for any console chip. Nintendo selected Nvidia’s Tegra X1 for their Switch handheld console. Tegra X1 is designed to maximize graphics performance in a limited power envelope, making it a natural choice for a console. And naturally for a Nvidia designed SoC, the Tegra X1 leverages the company’s Maxwell graphics architecture.

From the Tegra X1 Series Embedded Datasheet

Maxwell is better known for serving in Nvidia’s GTX 900 series discrete GPUs. There, it provided excellent performance and power efficiency. But Maxwell was primarily designed to serve in discrete GPUs with substantial area and power budgets. To fit Tegra X1’s low power requirements, Maxwell had to adapt to fit into a smaller power envelope.

Today, we’ll be running a few microbenchmarks on Nvidia’s Tegra X1, as implemented in the Nintendo Switch. We’re using Nemes’s Vulkan microbenchmark because the Tegra X1 does not support OpenCL. I also couldn’t get CUDA working on the platform.

Overview

Tegra X1 implements two Maxwell Streaming Multiprocessors, or SMs. SMs are basic building blocks in Nvidia’s GPUs and roughly analogous to CPU cores. As a Maxwell derivative, the Tegra X1’s SMs feature a familiar four scheduler partitions each capable of executing a 32-wide vector (warp) per cycle.

Tegra X1’s Maxwell is a bit different from the typical desktop variant. Shared memory, which is a fast software managed scratchpad, sees its capacity cut from 96 KB to 64 KB. Lower end Maxwell parts like the GM107 used in the GTX 750 Ti also have 64 KB of Shared Memory in their SMs, so there’s a chance Tegra could be using the GM107 Maxwell flavor. But L1 cache size is cut in half too, from 24 KB to 12 KB per two SM sub partitions (SMSPs). I don’t know if GM107 uses the smaller cache size. But even if it does, Tegra Maxwell sets itself apart with packed FP16 execution4, which can double floating point throughput (subject to terms and conditions).

Besides having less fast storage in each SM, Nintendo has chosen to run the iGPU at a low 768 MHz. For comparison, the EVGA GTX 980 Ti also tested here boosts at up to 1328 MHz, and typically runs above 1200 MHz. This high clock speed is shared with GM107, which averages around 1140 MHz.

Tegra X1’s datasheet suggests the iGPU can run at 1 GHz

Tegra X1’s low iGPU clock could be unique to the Nintendo Switch. Nvidia’s datasheet states the GPU should be capable of 1024 FP16 GFLOPs at reasonable temperatures. Working backwards, 1024 FP16 GFLOPS would be achieved with each of the iGPU’s 256 lanes working on two packed operands and performing two operations (a fused multiply add) on them at 1 GHz. However, I don’t have access to any other Tegra X1 platforms. Therefore, the rest of this article will evaluate Tegra X1 as implemented in the Switch, including the low clocks set by Nintendo.

Cache and Memory Latency

Tegra X1’s iGPU sees high latency throughout its memory hierarchy due to low clocks. If we focus on clock cycle counts, both the GTX 980 Ti and Tegra’s iGPU have about 110 cycles of L1 cache latency. Even though Tegra X1 has a smaller L1 cache running at lower clocks, Nvidia was unable to make the pipeline shorter. L2 cache latency is approximately 166 cycles on Tegra X1 compared to the GTX 980 Ti’s 257 cycles. Tegra taking fewer cycles to access its 256 KB L2 makes sense because the intra-GPU interconnect is smaller. But it’s not really a victory because the desktop part’s higher clock speed puts it ahead in absolute terms. Finally, VRAM latency is very high at over 400 ns.

Data gathered using Vulkan, with Nemes’s test suite

Intel’s Gen 9 (Skylake) integrated graphics provides an interesting comparison. Skylake’s GT2 graphics are found across a wide range of parts, and the HD 630 variant in later Skylake-derived generations is similar. While not designed primarily for gaming, it can be pressed into service by gamers without a lot of disposable income.

Intel has an interesting scheme for GPU caching. Global memory accesses from compute kernels go straight to an iGPU wide cache, which Intel calls a L3. To reduce confusion, I’ll use “L3” to refer to the iGPU’s private cache, and LLC to refer to the i5-6600K’s 6 MB of cache shared by the CPU and iGPU. The HD 530’s L3 has 768 KB of physical capacity2, but only part of it is allocated as cache for the shader array. Since I ran this test with the iGPU driving a display, 384 KB is available for caching. Despite having more caching capacity than Tegra X1’s L2, Intel’s L3 achieves lower latency.

AMD’s Raphael iGPU is a fun comparison. I don’t think a lot of people are gaming on Zen 4 iGPUs, but it is a minimum size RDNA 2 implementation. Like the Switch’s iGPU, Raphael’s iGPU has 256 KB of last level cache. But advances in GPU architecture and process nodes let Raphael’s iGPU clock to 2.2 GHz, giving it a massive latency lead.

Cache and Memory Bandwidth

GPUs tend to be bandwidth hungry. The Switch is notably lacking in this area especially when the SMs have to pull data from the 256 KB L2, which provides 46.1 GB/s. At 768 MHz, this is just above 60 bytes per cycle, so Tegra X1 could just have a single 64B/cycle L2 slice. If so, it’s the smallest and lowest bandwidth L2 configuration possible on a Maxwell GPU.

Not all 192 KB of storage in each of Intel’s L3 slices is usable as cache

Intel’s HD 530 runs at higher clock speeds and has a quad-banked L3. Each L3 bank can deliver 64B/cycle3, but L3 bandwidth is actually limited by the shader array. Each of the HD 530’s three subslices can consume 64B/cycle, for 192B/cycle total. The HD 530 isn’t a big GPU, but it does have a larger and higher bandwidth cache. As we get out of the small GPU-private caches, HD 530 can achieve higher bandwidth from the i5-6600K’s shared last level cache. The Tegra X1 drops out into DRAM sooner.

In main memory, Tegra X1 turns in a relatively better performance. Unlike the CPU which couldn’t get even 10 GB/s from DRAM, the iGPU can utilize most of the LPDDR4 setup’s available bandwidth. It’s still not as good as desktop DDR4, but now the HD 530 only has a 23% advantage.

Raphael’s iGPU has a massive bandwidth advantage over Tegra X1 throughout the memory hierarchy. RDNA 2 is designed to deliver very high bandwidth and even a minimal implementation is a force to be reckoned with. High clock speeds and 128 byte/cycle L2 slices give Raphael’s iGPU a high cache bandwidth to compute ratio. At larger test sizes, the 7950X3D’s dual channel DDR5-5600 setup shows what modern DRAM setups are capable of. The Switch gets left in the dust.

What happens if we compare the Switch’s Maxwell implementation to desktop Maxwell?

The Switch cannot compare to a desktop with a discrete GPU, enough said.

Compute Throughput

Tegra X1 uses Maxwell SMs similar to those found in desktop GPUs. Each SM has four scheduler partitions, each with a nominally 32-wide execution unit4. Nvidia uses 32-wide vectors or warps, so each partition can generally execute one instruction per cycle. Rarer operations like integer multiplies or FP inverse square roots execute at quarter rate.

The Switch enjoys throughput comparable to Intel’s HD 530 for most basic operations. It’s also comparable for special operations like inverse square roots. Intel pulls ahead for integer multiplication performance, though that’s not likely to make a difference for games.

As mentioned earlier, Tegra X1’s Maxwell gets hardware FP16 support. Two FP16 values can be packed into the lower and upper halves of a 32-bit register. If the compiler can pull that off, FP16 can execute at double rate. Unfortunately, Nvidia’s compiler wasn’t able to do FP16 packing. AMD and Intel do enjoy double rate FP16 execution. AMD’s FP16 execution scheme works the same way and also requires packing, so it’s a bit weird that Nvidia misses out.

However, we can verify the Switch’s increased FP16 throughput with vkpeak. Vkpeak focuses on peak throughput with fused multiply add operations, and can achieve higher FP16 throughput when using 4-wide vectors.

Vkpeak counts a fused multiply add as two operations

Even with higher FP16 throughput, the Switch falls behind Intel and AMD’s basic desktop integrated GPUs. Tegra X1 does give a good account of itself with 16-bit integer operations. However I expect games to stick with FP32 or FP16, with 32-bit integers used for addressing and control flow.

Vulkan Compute Performance (VkFFT)

VkFFT uses the Vulkan API to compute Fast Fourier Transforms. Here, we’re looking at the first set of subtests (VkFFT FFT + iFFT C2C benchmark 1D batched in single precision). The first few subtests appear very memory bandwidth bound on the RX 6900 XT, and I expect similar behavior on these smaller GPUs.

Intel’s lead in subtests 3 through 11 likely comes from a memory bandwidth advantage. HD 530’s DDR4-2133 isn’t great by modern standards, but a 128-bit memory bus is better than the 64-bit LPDDR4 memory bus on the Switch.

VkFFT outputs estimated bandwidth figures alongside scores. Some of the later subtests may not be bandwidth bound, as the bandwidth figures are far below theoretical. But Intel’s HD 530 still pulls ahead, likely thanks to its higher compute throughput.

CPU to GPU Uplink Performance

Integrated GPUs typically can’t compete with larger discrete cards in compute performance or memory bandwidth. But they can compare well in terms of how fast the CPU and GPU can communicate because discrete cards are constrained by a relatively low bandwidth PCIe interface.

The Nintendo Switch’s Tegra X1 enjoys decent bandwidth between the CPU and GPU memory spaces, and is likely held back by how fast the CPU can access memory. However, it loses in absolute terms to Nvidia’s GTX 980 Ti. Against the Tegra X1’s limited memory bandwidth, a 16x PCIe 3.0 interface can still compare well. Intel’s HD 530 turns in a similar performance when using the copy engine. But moving data with compute shaders provides a nice uplift, giving Intel the edge against the Switch.

Final Words

Tegra X1 shows the challenge of building a GPU architecture that scales across a wide range of power targets. Maxwell was built for big GPUs and has a large basic building block. Maxwell implementations can only be scaled 128 lanes at a time. Contrast that with Intel’s iGPU architecture, which can scale 8 lanes at a time by varying the number of scheduler partitions within a subslice. The equivalent in Nvidia’s world would be changing the number of scheduler partitions in a SM, changing GPU size 32 lanes at a time. Of course, Maxwell can’t do that. Adjusting GPU size 128 lanes at a time is totally fine when your GPU has over 2K lanes. With a GPU that’s just 256 lanes wide, Nvidia has their hands tied in how closely they can fit their targets.

On the Switch, Nintendo likely thought Nvidia’s default targets were a bit too high on the power and performance curve. The Switch runs Tegra X1’s iGPU at 768 MHz even though Nvidia’s documents suggest 1000 MHz should be typical. I wonder if the Switch would do better at 1000 MHz with a hypothetical 192 lane Maxwell implementation. Higher GPU clocks would improve performance for fixed function graphics blocks like rasterizers and render output units, even if theoretical compute throughput is similar. A smaller, faster clocked GPU would also require lower occupancy to exercise all of its execution units, though that’s unlikely to be a major issue because the Switch’s iGPU is so small already.

In terms of absolute performance, the Switch delivers a decent amount of graphics performance within a low power envelope. However, seeing a bog standard desktop iGPU win over a graphics-oriented console chip is eye opening. Even more eye opening is that developers are able to get recent AAA games ported to the Switch. Intel’s ubiquitous Skylake GT2 iGPU is often derided as being inadequate for serious gaming. Internet sentiment tends to accept leaving GPUs like the HD 530 behind in pursuit of better effects achieved on higher end cards.

Nintendo’s Switch shows this doesn’t have to be the case. If developers can deliver playable experiences on the Switch, they likely can do so with a HD 530 too. No doubt such optimizations require effort but carrying them out may be worth the reward of making PC gaming more accessible. Younger audiences in particular may not have the disposable income necessary to purchase current discrete cards, especially as GPU price increases at every market segment have outpaced inflation.

If you like our articles and journalism, and you want to support us in our endeavors, then consider heading over to our Patreon or our PayPal if you want to toss a few bucks our way. If you would like to talk with the Chips and Cheese staff and the people behind the scenes, then consider joining our Discord.

References

  1. Data Sheet Nvidia Tegra X1 Series Processors, Maxwell GPU + ARM v8
  2. Programmer’s Reference Manual, For the 2015 – 2016 Intel Core™ Processors, Celeron™ Processors and Pentium™ Processors based on the “Skylake” Platform, Volume 4: Configurations
  3. The Compute Architecture of Intel Processor Graphics Gen9
  4. Whitepaper Nvidia Tegra X1, Nvidia’s New Mobile Superchip


from Hacker News https://ift.tt/8cFuwsA

Monday, December 25, 2023

CrayZee Eighty

Ever wished that you had a Cray 1 Supercomputer? Ever wondered if an RC2014 backplane could wrap around a cylinder? Ever thought about how many retweets a Z80 drawing a Mandelbrot fractal could get? Ever had an idea that’s so daft, the only way to exorcise it is to do it? If so, would you like to Seymore…

Like most ideas in Lockdown, things started with a throwaway comment on Twitter and quickly escalated to laser cutting a toilet roll. I blame Shirley Knott

So, as a practical joke, the homage to the powerful Cray 1, and also the less powerful Rolodex worked surprisingly well. This inevitably lead to the question of making it work for real

Taking some measurements from the toilet roll, I laser cut a simple jig to hold 12 40 pin sockets around 270 degrees, with the intention of soldering wire from pin to pin in situ. This quickly demonstrated that it just wasn’t practical to get the soldering iron in such a tight area.

Another jig was made to hold the sockets at an even distance, and use brass wire to connect them up, with the intention of bending them around afterwards. This also became quickly apparent that it wasn’t going to work.

Luckily OSHPark offer a flex PCB option. I’ve been aware of this for a while, and wanted to try it, but there hadn’t been anything suitable within the RC2014 ecosystem. (Well, there have been requests for a Floppy Module, but I don’t think anybody actually wants a module which is floppy!). At $10 per square inch, it isn’t cheap, but, after a bit of KiCad work, the smallest 12 slot RC2014 backplane was ordered.

Soldering through hole components on to flex PCB is not easy, and 480 solder joints generate a lot of heat which will warp the plastic if it is not done carefully in a controlled manner. The Flex PCB was designed to fit the existing jigs, and when soldered up, it fitted perfectly!

Using the jig dimensions, I was able to 3D print a couple of end caps which held the slots in place and made things much more solid. I filled it with a bunch of spare modules and tested out if the backplane itself worked…

Huston we have a problem! Nothing came up when I plugged in a FTDI cable :-(

A few hours were wasted going down different rabbit holes chasing too many red herrings. The modules I’d put together essentially made up a RC2014 Zed, and were picked from some of my non-current module archive. What I’d forgotten about is that old versions of RomWBW which are built for use with a DS1302 RTC Module will hang for about 2 minutes on startup if the RTC cannot be found. So, in fact, it was all working perfectly, I just had to wait a little while after plugging in!

A quick upgrade to RomWBW v3.0.1 overcomes this problem, and should have been done right at the start!

To make things more Cray-like, I redesigned the end caps to be open at the top and bottom, and extended the lower one to support a laser cut skirt. One day this will house an IDE hard drive, but for now, it’s just there to mimic the bench seat on the Cray 1

The irony is not lost on me that the Pi Zero, which is only used to generate HDMI from serial data, is several orders of magnitude more powerful than the Cray 1, which is, itself, way more powerful than the Z80 which is calling all the shots!

There are no plans to release this as a product at this stage. The price would be too high to justify for a kit which really is not very practical at all.



from Hacker News https://ift.tt/SPhfQe1