Sunday, July 31, 2022

Gitea 1.17.0 is released – includes package registry support

Sun Jul 31, 2022 by The Gitea Authors

We are proud to present the release of Gitea version 1.17.0, a relatively big release with a lot of new and exciting features and plenty breaking changes. We highly encourage users to update to this version after carefully reading about the breaking changes for some important bug-fixes.

645 Pull Requests were merged to release this version.

You can download one of our pre-built binaries from our downloads page - make sure to select the correct platform! For further details on how to install, follow our installation guide.

We would also like to thank all of our supporters on Open Collective who are helping to sustain us financially.

Read on to learn about major new features and breaking changes.

Have you heard? We now have a swag shop! πŸ‘• 🍡


Major New Features

πŸš€ Package Registry (#16510)

Thanks to @KN4CK3R, Gitea now includes a package registry for various package managers (Composer, Conan, Generic, Helm, Maven, npm, NuGet, OCI Containers (Docker), PyPI and RubyGems). This will be very useful for teams that want to deploy their software from their own infrastructure.

To start using it, head over to the extensive documentation for this feature.

πŸš€ Cherry-pick, revert, apply-patch from the Web UI (#17902)

Thanks to @zeripath, you no longer need to switch to your local checkout to do common git operations like cherry-pick, revert, apply-patch. Instead, Gitea now provides a handy UI for these tasks!

πŸš€ Better Mobile Experience (#19546 et al.)

@Gusted worked to significantly improve the mobile experience of Gitea: A lot of the frontend has been refactored without making major changes to the current UI.

It now has a more responsive design, hence making browsing Gitea on mobile phones a more pleasant experience!

Check out the linked Pull Request for a demo & comparison of the updated layouts.

πŸš€ Improved file navigation (#19007 & #15028)

Navigating many files is often a struggle. This release features some workflow improvements you may already know from other forges:

  • During PR review, you can now mark changed files as reviewed and be informed about later changes to them (#19007). You can watch a demo in the linked Pull Request

  • @rogerluo410 and @wxiaoguang implemented a Go to file feature for the repo code listing (#15028) go to file demo

RSS-users will appreciate the new feeds for organizations and repositories that were added by @6543.

πŸš€ Auto merge pull requests when all checks succeeded (#19648 & #9307)

Thanks to @6543 & @kolaente, Gitea now allows you to merge a Pull request when all required checks have passed, either via the WebUI or via the API. Note that this feature will only be enabled when the target branch has branch protection.

πŸš€ Allow edits by maintainers (#18002)

Thanks to @qwerty287, Gitea now allows you to decide if maintainers of the upstream repository can push to the head branch of your PR when creating a Pull Request from a fork. This enables a workflow similar to how Gitea itself is maintained currently and can simplify the PR workflow for open source development.

πŸš€ Permanent issue (and PR) deletion (#19032)

Thanks to @fnetX, to combat spam and confidential information, issues (and subsequently PRs) can now be permanently deleted.
A repository admin or instance admin can find the delete button at the bottom of the sidebar of the issue or pull reuqest.

πŸš€ Generate table of contents in wikis (#19873)

Thanks to @zeripath, wiki pages now show their logical structure automatically in the sidebar. This removes the need to manually maintain a ToC and helps you to skim for the most important sections.

πŸš€ Customizing the default commit messages (#18177)

Thanks to @lunny, you can now set the default merge message used for merging PRs.
The customization files must be in .gitea/default_merge_message/<uppercase_merge_style>_TEMPLATE.md.
More information on the file names and possible variables can be found here.

πŸš€ Keeping the original ID of issues (#18446)

When you migrate a repository including its issues, their original ID will be persisted. This is only the first step to allow complete mirroring of issues and Pull Requests from other (Gitea, GitHub, GitLab, …) instances, with more to come in later releases.

πŸš€ Federation Progress (#19561 & #19462)

Gitea 1.17 lays the foundation to allow instances to communicate with each other in the future:
A new API path to allow basic communication of statistics between each other was added (#19561).
Additionally, with #19462, basic global data about users such as the preferred avatar can now be communicated.

Federation is under active development, and it will be interesting to see what will be achieved in the next few releases.
Stay tuned for what is yet to come!


Breaking Changes

❗ Internal Gitconfig (#19732)

Previously, Gitea used the users gitconfig ($HOME/.gitconfig) in addition to the system gitconfig (/etc/gitconfig).
Now, Gitea uses the system gitconfig (/etc/gitconfig) combined with an internal gitconfig located in {[git].HOME_PATH}/.gitconfig. If you customized your user gitconfig for Gitea, you should add these customizations to one of the available gitconfigs. Additional git-relevant files that are normally in your user home directory, like $HOME/.gnupg, should be moved/ copied to {[git].HOME_PATH}/ as well.

❗ Email address validation restricted (#17688)

With this release, Gitea restricts what is seen as a valid email:
Emails must only contain characters in a-zA-Z0-9.!#$%&'*+-/=?^_{|}`~. Additionally, the first letter must be in a-zA-Z0-9, and after the @, only characters in a-zA-Z0-9. can follow.

❗ Renamed configuration options for ACME / Let’s Encrypt (#18340)

Configuration settings have been renamed from LETSENCRYPT to ACME. The old settings are deprecated and will be removed in 1.18, you should migrate now.

  • ENABLE_LETSENCRYPTENABLE_ACME
  • LETSENCRYPT_URLACME_URL
  • LETSENCRYPT_ACCEPTTOSACME_ACCEPTTOS
  • LETSENCRYPT_DIRECTORYACME_DIRECTORY
  • LETSENCRYPT_EMAILACME_EMAIL

❗ New logger format and configuration (#17308)

This PR substantially changes the logging format of the router logger.
If you use this logging for monitoring (e.g. fail2ban) you will need to update this to match the new format.
Refer to the documentation on the router logger for new configuration options.

main as default branch (#19354)

The default value of the setting repository.DEFAULT_BRANCH was switched from master to main.
If you want to continue using master as default branch name, set this setting.
This change is especially relevant for third party tools that assume the default branch of a repository.

❗ Change initial trust model to committer (#18335)

Previously, Gitea would by default use the collaborator trust model.
This means only verified commits of collaborators can be trusted.
This was quite an aggressive trust model, and now it has changed to match GitHub’s behavior of trusting the commiter. This means verified commits in a repository from non-collaborators won’t be marked as unverified.

If you rely on the old behavior, you must set DEFAULT_TRUST_MODEL to collaborator.

❗ Require Git >= 2.0 (#19577)

The minimal required Git version has been raised to 2.0.
Versions below that are now unsupported and will prevent the application from starting.
In general, it is recommended to stay up-to-date with your Git version as some Gitea features or optimizations can only be used once they are available in Git.

❗ Require docker version >= 20.10.6 (#18050)

This is due to an issue with libc of the new base image alpine 3.15.

❗ Require Go >= 1.18 to compile (#19918, #19099)

The minimum version of Go needed to compile Gitea has been increased to 1.18.

❗ Changed handling of custom logo (#18542)

It is now not only possible to set a custom logo, but also a custom favicon. If you are currently using a custom logo, you need to re-run the steps described here.

RequireHighlightJS removed from templates (#19615)

If you use custom templates, check that they do not use RequireHighlightJS anymore as this was outdated already and has now been removed.

❗ Reserved usernames updated (#18438)

The following usernames are now newly reserved: avatar, ssh_info, and swagger_v1.json.
The following usernames are no longer reserved: help, install, less, plugins, stars, and template.

If you want to check if you’re affected, please run the following Gitea doctor command:

gitea doctor --run check-user-name

Note that this command is only available after upgrading to 1.17.0.

❗ Deprecated SSH ciphers removed from default setting (#18697)

This only affects Gitea instances that have enabled the internal SSH server.
Previously, Gitea allowed unsecured algorithms to be used for an SSH connection.
Older versions of OpenSSH might not be able to connect to Gitea anymore.

❗ Display messages for users if the ROOT_URL is wrong, show JavaScript errors (#18971)

Previously, Gitea would allow an incorrect ROOT_URL to be set in the settings. This caused unexpected issues when people don’t use that URL to visit Gitea.
Therefore, Gitea will now show an error in the UI when this is the case.
Please check if your ROOT_URL is set correctly and avoid accessing the instance using other URLs to avoid the error message.

/api/v1/notifications does not include repo permissions (#19761)

Previously, /api/v1/notifications returned repository.permissions but the permissions were calculated incorrectly.
Due to this and the fact that there exists another route to get the repo permissions, this field will always be null from now on.

❗ HTTP status codes updated: 302 → 307 and 301 → 308 (#18063)

Previously, Gitea often returned the incorrect status codes Found (302) and Moved Permanently (301).
All occurrences of such status code were now changed to Temporary Redirect (307) and Permanent Redirect (308) respectively.

❗ No more admin notice about successful cron tasks (#19221)

Successful cron task no longer emit a notification by default.
This breaks NO_SUCCESS_NOTICE settings.
If you want notices on success, you must set NOTICE_ON_SUCCESS=true.

Changelog

1.17.0 - 2022-07-30

  • BREAKING
    • Require go1.18 for Gitea 1.17 (#19918)
    • Make AppDataPath absolute against the AppWorkPath if it is not (#19815)
    • Nuke the incorrect permission report on /api/v1/notifications (#19761)
    • Refactor git module, make Gitea use internal git config (#19732)
    • Remove RequireHighlightJS field, update plantuml example. (#19615)
    • Increase minimal required git version to 2.0 (#19577)
    • Add a directory prefix gitea-src-VERSION to release-tar-file (#19396)
    • Use “main” as default branch name (#19354)
    • Make cron task no notice on success (#19221)
    • Add pam account authorization check (#19040)
    • Show messages for users if the ROOT_URL is wrong, show JavaScript errors (#18971)
    • Refactor mirror code & fix StartToMirror (#18904)
    • Remove deprecated SSH ciphers from default (#18697)
    • Add the possibility to allow the user to have a favicon which differs from the main logo (#18542)
    • Update reserved usernames list (#18438)
    • Support custom ACME provider (#18340)
    • Change initial TrustModel to committer (#18335)
    • Update HTTP status codes (#18063)
    • Upgrade Alpine from 3.13 to 3.15 (#18050)
    • Restrict email address validation (#17688)
    • Refactor Router Logger (#17308)
  • SECURITY
    • Use git.HOME_PATH for Git HOME directory (#20114) (#20293)
    • Add write check for creating Commit Statuses (#20332) (#20333)
    • Remove deprecated SSH ciphers from default (#18697)
  • FEDERATION
    • Return statistic information for nodeinfo (#19561)
    • Add Webfinger endpoint (#19462)
    • Store the foreign ID of issues during migration (#18446)
  • FEATURES
    • Automatically render wiki TOC (#19873)
    • Adding button to link accounts from user settings (#19792)
    • Allow set default merge style while creating repo (#19751)
    • Auto merge pull requests when all checks succeeded (#9307 & #19648)
    • Improve reviewing PR UX (#19612)
    • Add support for rendering console output with colors (#19497)
    • Add Helm Chart registry (#19406)
    • Add Goroutine stack inspector to admin/monitor (#19207)
    • RSS/Atom support for Orgs & Repos (#17714 & #19055)
    • Add button for issue deletion (#19032)
    • Allow to mark files in a PR as viewed (#19007)
    • Add Index to comment for migrations and mirroring (#18806)
    • Add health check endpoint (#18465)
    • Add packagist webhook (#18224)
    • Add “Allow edits from maintainer” feature (#18002)
    • Add apply-patch, basic revert and cherry-pick functionality (#17902)
    • Add Package Registry (#16510)
    • Add LDAP group sync to Teams (#16299)
    • Pause queues (#15928)
    • Added auto-save whitespace behavior if it changed manually (#15566)
    • Find files in repo (#15028)
    • Provide configuration to allow camo-media proxying (#12802)
    • Allow custom default merge message with .gitea/default_merge_message/<merge_style>_TEMPLATE.md (#18177)
  • API
    • Add endpoint to serve blob or LFS file content (#19689)
    • Add endpoint to check if team has repo access (#19540)
    • More commit info (#19252)
    • Allow to create file on empty repo (#19224)
    • Allow removing issues (#18879)
    • Add endpoint to query collaborators permission for a repository (#18761)
    • Return primary language and repository language stats API URL (#18396)
    • Implement http signatures support for the API (#17565)
  • ENHANCEMENTS
    • Make notification bell more prominent on mobile (#20108, #20236, #20251) (#20269)
    • Adjust max-widths for the repository file table (#20243) (#20247)
    • Display full name (#20171) (#20246)
    • Add dbconsistency checks for Stopwatches (#20010)
    • Add fetch.writeCommitGraph to gitconfig (#20006)
    • Add fgprof pprof profiler (#20005)
    • Move agit dependency (#19998)
    • Empty log queue on flush and close (#19994)
    • Remove tab/TabName usage where it’s not needed (#19973)
    • Improve file header on mobile (#19945)
    • Move issues related files into models/issues (#19931)
    • Add breaking email restrictions checker in doctor (#19903)
    • Improve UX on modal for deleting an access token (#19894)
    • Add alt text to logo (#19892)
    • Move some code into models/git (#19879)
    • Remove customized (unmaintained) dropdown, improve aria a11y for dropdown (#19861)
    • Make user profile image show full image on mobile (#19840)
    • Replace blue button and label classes with primary (#19763)
    • Remove fomantic progress module (#19760)
    • Allows repo search to match against “owner/repo” pattern strings (#19754)
    • Move org functions (#19753)
    • Move almost all functions’ parameter db.Engine to context.Context (#19748)
    • Show source/target branches on PR’s list (#19747)
    • Use http.StatusTemporaryRedirect(307) when serve avatar directly (#19739)
    • Add doctor orphan check for orphaned pull requests without an existing base repo (#19731)
    • Make Ctrl+Enter (quick submit) work for issue comment and wiki editor (#19729)
    • Update go-chi/cache to utilize Ping() (#19719)
    • Improve commit list/view on mobile (#19712)
    • Move some repository related code into sub package (#19711)
    • Use a better OlderThan for DeleteInactiveUsers (#19693)
    • Introduce eslint-plugin-jquery (#19690)
    • Tidy up <head> template (#19678)
    • Calculate filename hash only once (#19654)
    • Simplify IsVendor (#19626)
    • Add “Reference” section to Issue view sidebar (#19609)
    • Only set CanColorStdout / CanColorStderr to true if the stdout/stderr is a terminal (#19581)
    • Use for a repo action one database transaction (#19576)
    • Simplify loops to copy (#19569)
    • Added X-Mailer header to outgoing emails (#19562)
    • use middleware to open gitRepo (#19559)
    • Mute link in diff header (#19556)
    • Improve UI on mobile (#19546)
    • Fix Pull Request comment filename word breaks (#19535)
    • Permalink files In PR diff (#19534)
    • PullService lock via pullID (#19520)
    • Make repository file list useable on mobile (#19515)
    • more context for models (#19511)
    • Refactor readme file renderer (#19502)
    • By default force vertical tabs on mobile (#19486)
    • Github style following followers (#19482)
    • Improve action table indices (#19472)
    • Use horizontal tabs for repo header on mobile (#19468)
    • pass gitRepo down since its used for main repo and wiki (#19461)
    • Admin should not delete himself (#19423)
    • Use queue instead of memory queue in webhook send service (#19390)
    • Simplify the code to get issue count (#19380)
    • Add commit status popup to issuelist (#19375)
    • Add RSS Feed buttons to Repo, User and Org pages (#19370)
    • Add logic to switch between source/rendered on Markdown (#19356)
    • Move some helper files out of models (#19355)
    • Move access and repo permission to models/perm/access (#19350)
    • Disallow selecting the text of buttons (#19330)
    • Allow custom redirect for landing page (#19324)
    • Remove dependent on session auth for api/v1 routers (#19321)
    • Never use /api/v1 from Gitea UI Pages (#19318)
    • Remove legacy unmaintained packages, refactor to support change default locale (#19308)
    • Move milestone to models/issues/ (#19278)
    • Configure OpenSSH log level via Environment in Docker (#19274)
    • Move reaction to models/issues/ (#19264)
    • Make git.OpenRepository accept Context (#19260)
    • Move some issue methods as functions (#19255)
    • Show last cron messages on monitor page (#19223)
    • New cron task: delete old system notices (#19219)
    • Add Redis Sentinel Authentication Support (#19213)
    • Add auto logging of goroutine pid label (#19212)
    • Set OpenGraph title to DisplayName in profile pages (#19206)
    • Add pprof labels in processes and for lifecycles (#19202)
    • Let web and API routes have different auth methods group (#19168)
    • Move init repository related functions to modules (#19159)
    • Feeds: render markdown to html (#19058)
    • Allow users to self-request a PR review (#19030)
    • Allow render HTML with css/js external links (#19017)
    • Fix script compatiable with OpenWrt (#19000)
    • Support ignore all santize for external renderer (#18984)
    • Add note to GPG key response if user has no keys (#18961)
    • Improve Stopwatch behavior (#18930)
    • Improve mirror iterator (#18928)
    • Uncapitalize errors (#18915)
    • Prevent Stats Indexer reporting error if repo dir missing (#18870)
    • Refactor SecToTime() function (#18863)
    • Replace deprecated String.prototype.substr() with String.prototype.slice() (#18796)
    • Move deletebeans into models/db (#18781)
    • Fix display time of milestones (#18753)
    • Add config option to disable “Update branch by rebase” (#18745)
    • Display template path of current page in dev mode (#18717)
    • Add number in queue status to monitor page (#18712)
    • Change git.cmd to RunWithContext (#18693)
    • Refactor i18n, use Locale to provide i18n/translation related functions (#18648)
    • Delete old git.NewCommand() and use it as git.NewCommandContext() (#18552)
    • Move organization related structs into sub package (#18518)
    • Warn at startup if the provided SCRIPT_TYPE is not on the PATH (#18467)
    • Use CryptoRandomBytes instead of CryptoRandomString (#18439)
    • Use explicit jQuery import, remove unused eslint globals (#18435)
    • Allow to filter repositories by language in explore, user and organization repositories lists (#18430)
    • Use base32 for 2FA scratch token (#18384)
    • Unexport var git.GlobalCommandArgs (#18376)
    • Don’t underline commit status icon on hover (#18372)
    • Always use git command but not os.Command (#18363)
    • Switch to non-deprecation setting (#18358)
    • Set the LastModified header for raw files (#18356)
    • Refactor jwt.StandardClaims to RegisteredClaims (#18344)
    • Enable deprecation error for v1.17.0 (#18341)
    • Refactor httplib (#18338)
    • Limit max-height of CodeMirror editors for issue comment and wiki (#18271)
    • Validate migration files (#18203)
    • Format with gofumpt (#18184)
    • Prettify number of issues (#17760)
    • Add a “admin user generate-access-token” subcommand (#17722)
    • Custom regexp external issues (#17624)
    • Add smtp password to install page (#17564)
    • Add config options to hide issue events (#17414)
    • Prevent double click new issue/pull/comment button (#16157)
    • Show issue assignee on project board (#15232)
  • BUGFIXES
    • WebAuthn CredentialID field needs to be increased in size (#20530) (#20555)
    • Ensure that all unmerged files are merged when conflict checking (#20528) (#20536)
    • Stop logging EOFs and exit(1)s in ssh handler (#20476) (#20529)
    • Add labels to two buttons that were missing them (#20419) (#20524)
    • Fix ROOT_URL detection for URLs without trailing slash (#20502) (#20503)
    • Dismiss prior pull reviews if done via web in review dismiss (#20197) (#20407)
    • Allow RSA 2047 bit keys (#20272) (#20396)
    • Add missing return for when topic isn’t found (#20351) (#20395)
    • Fix commit status icon when in subdirectory (#20285) (#20385)
    • Initialize cron last (#20373) (#20384)
    • Set target on create release with existing tag (#20381) (#20382)
    • Update xorm.io/xorm to fix a interpreting db column sizes issue on 32bit systems (#20371) (#20372)
    • Make sure repo_dir is an empty directory or doesn’t exist before ‘dump-repo’ (#20205) (#20370)
    • Prevent context deadline error propagation in GetCommitsInfo (#20346) (#20361)
    • Correctly handle draft releases without a tag (#20314) (#20335)
    • Prevent “empty” scrollbars on Firefox (#20294) (#20308)
    • Refactor SSH init code, fix directory creation for TrustedUserCAKeys file (#20299) (#20306)
    • Bump goldmark to v1.4.13 (#20300) (#20301)
    • Do not create empty “.ssh” directory when loading config (#20289) (#20298)
    • Fix NPE when using non-numeric (#20277) (#20278)
    • Store read access in access for team repositories (#20275) (#20276)
    • EscapeFilter the group dn membership (#20200) (#20254)
    • Only show Followers that current user can access (#20220) (#20252)
    • Update Bluemonday to v1.0.19 (#20199) (#20209)
    • Refix indices on actions table (#20158) (#20198)
    • Check if project has the same repository id with issue when assign project to issue (#20133) (#20188)
    • Fix remove file on initial comment (#20127) (#20128)
    • Catch the error before the response is processed by goth (#20000) (#20102)
    • Dashboard feed respect setting.UI.FeedPagingNum again (#20094) (#20099)
    • Alter hook_task TEXT fields to LONGTEXT (#20038) (#20041)
    • Respond with a 401 on git push when password isn’t changed yet (#20026) (#20027)
    • Return 404 when tag is broken (#20017) (#20024)
    • Alter hook_task TEXT fields to LONGTEXT (#20038) (#20041)
    • Respond with a 401 on git push when password isn’t changed yet (#20026) (#20027)
    • Return 404 when tag is broken (#20017) (#20024)
    • Write Commit-Graphs in RepositoryDumper (#20004)
    • Use DisplayName() instead of FullName in Oauth Provider (#19991)
    • Don’t buffer doctor logger (#19982)
    • Always try to fetch repo for mirrors (#19975)
    • Uppercase first languages letters (#19965)
    • Fix cli command restore-repo: “units” should be parsed as StringSlice (#19953)
    • Ensure minimum mirror interval is reported on settings page (#19895)
    • Exclude Archived repos from Dashboard Milestones (#19882)
    • gitconfig: set safe.directory = * (#19870)
    • Prevent NPE on update mirror settings (#19864)
    • Only return valid stopwatches to the EventSource (#19863)
    • Prevent NPE whilst migrating if there is a team request review (#19855)
    • Fix inconsistency in doctor output (#19836)
    • Fix release tag for webhook (#19830)
    • Add title attribute to dependencies in sidebar (#19807)
    • Estimate Action Count in Statistics (#19775)
    • Do not update user stars numbers unless fix is specified (#19750)
    • Improved ref comment link when origin is body/title (#19741)
    • Fix nodeinfo caching and prevent NPE if cache non-existent (#19721)
    • Fix duplicate entry error when add team member (#19702)
    • Fix sending empty notifications (#19589)
    • Update image URL for Discord webhook (#19536)
    • Don’t let repo clone URL overflow (#19517)
    • Allow commit status popup on /pulls page (#19507)
    • Fix two UI bugs: JS error in imagediff.js, 500 error in diff/compare.tmpl (#19494)
    • Fix logging of Transfer API (#19456)
    • Fix panic in teams API when requesting members (#19360)
    • Refactor CSRF protection modules, make sure CSRF tokens can be up-to-date. (#19337)
    • An attempt to sync a non-mirror repo must give 400 (Bad Request) (#19300)
    • Move checks for pulls before merge into own function (#19271)
    • Fix contrib/upgrade.sh (#19222)
    • Set the default branch for repositories generated from templates (#19136)
    • Fix EasyMDE error when input Enter (#19004)
    • Don’t clean up hardcoded tmp (#18983)
    • Delete related notifications on issue deletion too (#18953)
    • Fix trace log to show value instead of pointers (#18926)
    • Fix behavior or checkbox submission. (#18851)
    • Add ContextUser (#18798)
    • Fix some mirror bugs (#18649)
    • Quote MAKE to prevent path expansion with space error (#18622)
    • Preserve users if restoring a repository on the same Gitea instance (#18604)
    • Fix non-ASCII search on database (#18437)
    • Automatically pause queue if index service is unavailable (#15066)
  • TESTING
    • Allow postgres integration tests to run over unix pipe (#19875)
    • Prevent intermittent NPE in queue tests (#19301)
    • Add test for importing pull requests in gitea uploader for migrations (#18752)
    • Remove redundant comparison in repo dump/restore (#18660)
    • More repo dump/restore tests, including pull requests (#18621)
    • Add test coverage for original author conversion during migrations (#18506)
  • TRANSLATION
    • Update issue_no_dependencies description (#19112)
    • Refactor webhooks i18n (#18380)
  • BUILD
  • DOCS
    • Update documents (git/fomantic/db, etc) (#19868)
    • Update the ROOT documentation and error messages (#19832)
    • Update document to use FHS /usr/local/bin/gitea instead of /app/... for Docker (#19794)
    • Update documentation to disable duration settings with -1 instead of 0 (#19647)
    • Add warning to set SENDMAIL_ARGS to – (#19102)
    • Update nginx reverse proxy docs (#18922)
    • Add example to render html files (#18736)
    • Make SSH passtrough documentation better (#18687)
    • Changelog 1.16.0 & 1.15.11 (#18468 & #18455) (#18470)
    • Update the SSH passthrough documentation (#18366)
    • Add contrib/upgrade.sh (#18286)
  • MISC
    • Fix aria for logo (#19955)
    • In code search, get code unit accessible repos in one (main) query (#19764)
    • Add tooltip to pending PR comments (#19662)
    • Improve sync performance for pull-mirrors (#19125)
    • Improve dashboard’s repo list performance (#18963)
    • Avoid database lookups for DescriptionHTML (#18924)
    • Remove CodeMirror dependencies (#18911)
    • Disable unnecessary mirroring elements (#18527)
    • Disable unnecessary OpenID/OAuth2 elements (#18491)
    • Disable unnecessary GitHooks elements (#18485)
    • Change some logging levels (#18421)
    • Prevent showing webauthn error for every time visiting /user/settings/security (#18385)
    • Use correct translation key for errors (#18342)


from Hacker News https://ift.tt/ixPvtX6

The Nothing Phone (1)

Perfected.

Nothing OS delivers only the best of Android. No bloatware. Just speed and a smooth experience. Hardware and software speak a single visual language, with bespoke widgets, fonts, sounds and wallpapers. Enables seamless integration with third party products.

1.Perfected. 

2.Open. 

3.Speed. 

4.NFTs. 



from Hacker News https://ift.tt/avXeFLH

Max Headroom, the Strange Pre-Internet AI Phenomenon, Is Getting Rebooted

Grab your floppy discs, crank the Duran Duran, and crack open some ice cold New Coke. Max Headroom, the perplexing fictional character played by Matt Frewer in a variety of television forms, is coming back, as per an announcement in Deadline

Halt and Catch Fire co-creator Christopher Cantwell, working with Elijah Wood and Daniel Noah’s company SpectreVision, will bring the series to AMC. While a reboot of an old “oh, I know that!” IP can often lead to rolled eyes, anyone who has seen a SpectreVision project knows this group comes correct. (Stream Mandy or, better yet, The Greasy Strangler, if you dare.)

While the image of Max Headroom is likely familiar to most, the origin of this character can get a little murky. His first appearance was in a 1985 dystopian comedy for British television, Max Headroom: 20 Minutes Into The Future, in which a television journalist played by Frewer named Edison Carter gets his consciousness cloned and uploaded into an AI wacky television host named Max Headroom. (The last thing he saw before an automobile accident that read Max. Headroom: 2:3 M.) There follows an adventure tale involving a corrupt television network overstimulating viewers to death with super-short subliminal commercials called “blipverts."

Content

This content can also be viewed on the site it originates from.

Almost simultaneously, The Max Headroom Show, in which “Max” would introduce music videos and interview guests, debuted on Britain's Channel 4. It was claimed that the glitchy character was entirely computer generated, but this was not true; it was Frewer under a lot of makeup. 

Much of the Max Headroom experiment shared a similar look and feel to Terry Gilliam’s Brazil, which was released at around the same time. Both shows eventually made their way to America via MTV, and eventually ABC launched a series based on the fictional film, while Cinemax continued the talk show.

Content

This content can also be viewed on the site it originates from.



from Hacker News https://ift.tt/Y1PXNfx

Application Architecture: A Quick Guide for Startups

When you’ve got a great idea for a startup, application architecture is probably one of the last things on your mind. But architecting your app right the first time can save you major headaches further down the road.

So let’s take a look at a typical startup application architecture, with a particular focus on the database and how the choices that application architects make in that part of their stack can impact scale, user experience, and more.

Single-region architecture

startup reference architecture - single region

Working from the top down, modern applications begin with a user-facing front end that’s typically built with the languages of the web – HTML, CSS, JavaScript, etc.

The front end of the application is connected to the back end via a load balancer that distributes requests to instances of the application, which is written in your programming language of choice – Node.js, Python, Go, etc. These applications are typically built and deployed via shared services (for example, Netlify or Vercel) that allow the use of features like serverless functions, enabling elastic scale for the application backend without the costs associated with purchasing a bunch of dedicated machines.

From the back end, data may be passed horizontally to a data warehouse for long-term storage and analytics processing, often through a message queue or Apache Kafka.

Mission-critical data (such as transactional data) is also passed vertically from the back end of the application to the relational database, either directly or via an intermediary such as an ORM (Object-Relational Mapper) or something like GraphQL.

At the database level, even small applications typically aim for some sort of redundancy, since relying on a single database node means your entire app goes down if the node fails. However, this is typically an active/passive setup and there are some limitations around availability and complexity with this approach. Let’s look at a couple of popular options for maintaining high availability for your database in more detail.

Database considerations for single-region

The easiest way to achieve reasonably high availability is with two database nodes, configured using the active-passive model as depicted in the “before” part of the image above. With active-passive, a single database node handles all reads and writes, which are then synchronized with the passive node for backup. If the active node goes down, you can switch over to make the passive node active, and then restore the active-passive configuration once you’ve gotten the broken node back online.

However, this approach requires a lot of complex operations work to manage and maintain, and while you’re paying for two database nodes, you’re only getting the performance of a single-node system, since a single node handles all read and write requests. If you want to send data from this database to a system like Kafka, for example, you’ll also have to build something like a transactional outbox to ensure consistency between Kafka and your database and avoid the dual write problem.

startup reference architecture - single region with cockroachdb

In the architecture above, we can see how using CockroachDB offers an alternative approach. CockroachDB is an active-active database, so if (for example) you’re running CockroachDB with three nodes, you’re getting the performance benefits of having three nodes because reads and writes are spread across the nodes rather than all being routed through a single node as in active-passive.

More importantly, an active-active database provides inherent resilience and very little operational complexity is needed to ensure uptime in the case of node failures. Data is distributed across multiple nodes and every node is an endpoint, so all nodes can service both reads and writes. This active redundancy allows you to not just survive failures but also perform database upgrades and schema changes without downtime.

The CockroachDB implementation is also easy to scale up and down by, for example, adding or removing nodes from a CockroachDB cluster. With CockroachDB serverless, scaling is entirely automated, and your database resources will scale up and down in real time in response to the needs of your application.

Additionally, CockroachDB’s CDC (Change Data Capture) feature makes it easy to store the same data in your transactional database and Kafka without the possibility of introducing inconsistencies.

Scaling the database to multiple regions

startup reference architecture - multiregion

In general, an application doesn’t change much when moving from single-region to multi-region. However, the relational database can present a significant challenge when your app starts to expand beyond a single region.

Consider, for example, an ecommerce application using the architecture depicted above. If a user in Idaho makes a purchase, that transaction will be stored in the Idaho instance of the application database, but the South Carolina instance also needs that information immediately so that (for example) a user in South Carolina can’t buy an item that just went out of stock because an Idaho user purchased the last one.

Database considerations for multi-region

Typically, organizations will synchronize data between two or more active database instances, one within each region. However, this approach presents an operational challenge and doesn’t offer optimal performance. Maintaining the synchronization is difficult and connection issues can introduce inconsistencies across the repositories. Further, performance is likely to be particularly poor if a region’s database node goes down, since all of that region’s reads and writes will then need to be routed to another region’s node, which suddenly has to handle double the workload.

startup reference architecture - multiregion with cockroachdb

Again, CockroachDB presents a very different approach. Built from the ground up for geographic scale, CockroachDB allows for multi-region deployments to be treated like a single logical database. In other words: your application can treat the database as though it were a single Postgres instance. CockroachDB handles the geographic distribution itself, automatically.

Every node in CockroachDB (in every region) can service reads and writes and access all of the data within the cluster. It doesn’t synchronize data; it uses distributed consensus to ensure database writes are right all the time and everywhere.

And again, because CockroachDB is distributed by nature, all data is replicated across multiple nodes in each region, ensuring superior performance and availability. If one node in a region goes down, read and write requests will be handled by the other nodes in that region, with no need to introduce the latency issues that come with routing traffic to another region.

You can scale later, but architect for scale now

In the days when scaling a relational database meant dealing with manual sharding, it made sense for startups to ignore the problem of database scale until they were actually likely to be confronted with it.

Today, however, distributed SQL databases such as CockroachDB can offer easy elastic scale, and building with them is as simple and affordable as building with a traditional option like Postgres. Given that, it makes sense to build from the outset with a distributed database, rather than having to deal with the challenges and headaches of trying to migrate from a legacy solution when you outgrow it down the road.

With the rise of serverless options such as CockroachDB Serverless, you can actually get a distributed database with automated elastic scale for free. With CockroachDB, it’s easy and affordable to architect your application for scale now, even if you won’t need to scale until later.



from Hacker News https://ift.tt/fbV9asz

Opppppsss you did it again

shellfirm

Opppppsss you did it again? 😱 😱 😰

How do I save myself from myself?

  • rm -rf *
  • git reset --hard Before hitting the enter key?
  • kubectl delete ns Stop! you are going to delete a lot of resources
  • And many more!

Do you want to learn from other people's mistakes?

shellfirm will intercept any risky patterns (predefined or user's custom additions) and will immediately prompt a small challenge that will double verify your action, think of it as a captcha for your terminal.

$ rm -rf /
#######################
# RISKY COMMAND FOUND #
#######################
* You are going to delete everything in the path.

Solve the challenge: 8 + 0 = ? (^C to cancel)

How does it work?

shellfirm will evaluate all the shell commands behind the scenes. If a risky pattern is detected, you will immediately get a prompt with the relevant warning to verify your command.

Example

Installation

1. Install via brew

brew tap kaplanelad/tap && brew install shellfirm

Or download the binary file from releases page, unzip the file and move to /usr/local/bin folder.

2. select shell type:

If you use Oh My Zsh

curl https://raw.githubusercontent.com/kaplanelad/shellfirm/main/shell-plugins/shellfirm.plugin.zsh --create-dirs -o ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/shellfirm/shellfirm.plugin.zsh
  • Add shellfirm to the list of Oh My Zsh plugins when Zsh is loaded(inside ~/.zshrc):
  • β„Ή️ Open a new shell session

More supported shells

Verify installation

$ mkdir /tmp/shellfirm
$ cd /tmp/shellfirm
$ git reset --hard

You should get a shellfirm prompt challenge.

If you didn't get the prompt challenge:

  1. Make sure the shellfirm --version returns a valid response.
  2. Make sure that you downloaded the Zsh plugin and added it to the Oh My Zsh plugins in .zshrc.

Risky commands

We have predefined a baseline of risky groups command that will be enabled by default, these are risky commands that might be destructive.

Group Enabled By Default
base true
git true
fs true
fs-strict false
shellfirm config update --check-group fs-strict
kubernetes false
shellfirm config update --check-group kubernetes
kubernetes-strict false
shellfirm config update --check-group kubernetes-strict
heroku false
shellfirm config update --check-group heroku

Custom checks definition examples

shellfirm creates by default a configuration file at ~/.shellfirm/config.yaml. Make sure that you only edit enable field (in case you want to disable a specific check), all the rest fields are managed by shellfirm command (shellfirm config --help).

challenge: Math # Math, Enter, Yes

includes: 
  - base
  - fs
  - git

checks:
  - test: git reset
    method: Contains
    enable: true
    description: "This command going to reset all your local changes."
    from: git
    challenge: Default
  - test: "rm.+(-r|-f|-rf|-fr)*"
    method: Regex
    enable: true
    description: "You are going to delete everything in the path."
    from: fs
    challenge: Default
  - test: ">.+/dev/sda"
    method: Regex
    enable: true
    description: "Writing the data directly to the hard disk drive and damaging your file system."
    from: fs
    challenge: Default
  - test: "mv+.*/dev/null"
    method: Regex
    enable: true
    description: "The files will be discarded and destroyed."
    from: fs
    challenge: Default

β„Ή️ To define custom checks that are not part of shillfirm baseline, add new checks to the config.yaml with the following field: from: custom.

  - test: "command to check"
    method: Regex
    enable: true
    description: "Example of custom check."
    from: custom
    challenge: Default

β„Ή️ To define different challenge for a checks you can change the field challenge: Default with a different check.

Add new group checks

$ shellfirm config update --check-group {risky-command-group-a} {risky-command-group-b}

Remove new group checks

$ shellfirm config update --check-group {group} {group} --remove

Disable specific checks

Edit the configuration file in ~/.shellfirm/config.yaml and change the check to enable:false.

Change challenge:

Currently we support 3 different challenges when a risky command is intercepted:

  • Math - Default challenge which requires you to solve a math question.
  • Enter - Required only to press Enter to continue.
  • Yes - Required typing yes to continue.

You can change the default challenge by running the command:

$ shellfirm config challenge --challenge Math

At any time you can cancel a risky command by hitting ^C

To Upgrade shellfirm

Contributing

Thank you for your interest in contributing! Please refer to contribution guidelines for guidance.



from Hacker News https://ift.tt/wCFS9jK

I've been making JavaScript sandbox alone for 6 years



from Hacker News https://playcode.io/

Saturday, July 30, 2022

What Is EXIF Data in Digital Photos?

In the digital age, there’s a lot more to a photograph than the image itself. When most digital cameras capture an image, they record certain parameters and write them in the image’s file for later use. These parameters are called metadata and are stored as Exchangeable Image File data, or EXIF for short.

EXIF is useful, but can also have negative impacts in certain cases. This guide will teach you about EXIF, its advantages and disadvantages, and how to use it in photography.

Note: Although metadata is stored as EXIF, the two terms are often used interchangeably.

Table of Contents

EXIF Parameters

It’s important to understand that EXIF is much more than just the shutter speed, aperture, and ISO of your image. While you might use those the most when looking at EXIF data, EXIF encompasses much more. Here is a list of common parameters that EXIF encompasses.

  • Shutter speed
  • Aperture
  • ISO
  • Camera model, manufacturer, and serial number
  • Lens model, manufacturer, and serial number
  • Focal length when the photo was taken
  • White balance
  • Metering settings
  • Flash settings (if one was used)
  • Image resolution
  • Color space
  • Date and time that the photo was taken
  • Post-processing adjustments
  • GPS coordinates of the photo’s location (if your camera has GPS and it is turned on)

How to View EXIF Data

There are many ways to access and view EXIF, from websites to built-in tools that come with your computer. This may come in handy when trying to view your own EXIF, and could also be useful if you are trying to find EXIF in an image that is not yours.

Using a Website to View EXIF Data

Perhaps one of the easiest ways to view EXIF data is to upload your image(s) to a website. This has its advantages in being easy to find and operate. It can also take much longer and require more effort because you will need to upload your image(s) with an internet connection. This method is difficult to use with a batch of images, so it’s best if you’re only looking for the EXIF of
a small number of images. Some websites may give you more data than built-in tools in your computer, such as adjustments in post-production.

One potential risk of using online EXIF viewers is that you are uploading your images to a website that could use your images for other purposes. Before using an online EXIF viewer, be sure to research potential risks and read reviews.

Here are a few websites that you may find useful in viewing EXIF data:

Screenshots of EXIF viewing web tools.

Adobe Lightroom’s Metadata Panel

Perhaps one of the most organized and useful ways to view metadata is via Adobe Lightroom’s metadata tool. To access it, click the Library module > Library Filter: Metadata.

This will allow you to search your catalog by metadata, and you can add or remove parameters as you choose. This will give you a quick glance at different parameters and their images, and it will help you search your entire catalog if you’re looking for something specific regarding metadata.

Built-in EXIF Viewer on a Mac

To view an image’s metadata on Mac, simply open it in Preview, which is usually the default application to display images. From here, click Tools > Show Inspector (or Command + I) and click the “EXIF” tab at the top of the Inspector.

This won’t give you as much data as some of the websites listed above, but it will give you the most commonly used EXIF parameters.

If you’re looking to just quickly glance at a few parameters such as shutter speed, focal length, aperture, and resolution, you can right-click on an image in its folder and press “Get Info.”

This will give you a quick glance at parameters you may be looking for, and can easily be done with a batch of images if they are all selected.

Built-in EXIF Viewer in Windows

To view an image’s metadata on Windows, right click the image in its folder and select “Properties.”

Click the “Details” tab and EXIF data will be listed. You’ll also see an option to edit and remove EXIF, which is discussed later in this article.

Advantages and Disadvantages of EXIF Data

In a world where EXIF is often shared over social media, there has been discussion about certain ethics and practices of publicizing the settings you used in a photo, your location, and other parameters. Here are a few pros and cons of sharing and using EXIF data.

Pros

1. Teaching and Learning. EXIF is a great teaching and learning tool. If you want to learn the right balance between shutter speed, aperture, and ISO in certain scenarios, find photographers who share their EXIF data and try to replicate their tactics. Of course, it would be even better to learn how to adjust your settings in an adaptive way, so see the corresponding con below.

Tip: Study EXIF by going through data and asking yourself if you can figure out why a certain parameter is the way it is. This will help you understand the “why” behind certain metadata, and allow you to better understand why you might change a parameter in the field.

2. Organizing Post-Production. EXIF can help you stay organized in post-production. Maybe you have a bunch of shots at high ISO and you want to run a batch denoise algorithm on those images. Knowing how to read their metadata is important for knowing which images you might want to process. Use Adobe Lightroom’s metadata search feature to search through your entire catalog for certain parameters.

3. Consistency Between Photos. If you need to stay consistent, (i.e. you are doing headshots for a company that hires new people throughout the year and wants similar photos for all of them), you can refer to your own EXIF to try to stay as consistent as possible. If you’re outdoors, environmental factors can influence your consistency, but the image should be somewhat similar if you are in the ballpark.

4. Crime Evidence. EXIF has been extremely useful in crime investigation, especially with the advancement of GPS technology. Photos shared and sent with location metadata stored in EXIF files have helped determine the whereabouts of people involved in criminal activity. Even simpler than GPS are the date and time of when images were taken, which are recorded in EXIF and can also be used as evidence.

Cons

1. Straight Copying Isn’t Useful. Simply copying someone’s settings won’t be useful on a shoot where you need to adapt, such as a portrait shoot, adventure shoot, and most types of commercial photography. Using jazz musicians as an analogy, they are generally respected for their ability to improvise and use what they know about music to make new, equally good, or better ideas. Sure, they would sound great if they played a complex solo exactly the same as someone else, but that isn’t useful in many scenarios and leaves no room for creative interpretation. Instead of copying someone’s EXIF, learn how EXIF combines to make a great image, and use that to your advantage.

2. Lack of Originality. With the point above, originality can be compromised if other photographers are using your exact metadata to produce an image like yours.

3. Privacy Issues. Certain parameters, especially those involving location, can have privacy-related impacts on people and the environment. Hard-to-reach locations that have been recorded in EXIF have been “discovered” and ruined by erosion, littering, and other detrimental human impacts. On social media apps especially, there are security concerns regarding sharing images with a specific location, since many smartphones record GPS coordinates in their EXIF and apps will provide an option to share that.

4. Inconsistent Standards. EXIF standards are not necessarily maintained and kept up to date. This can make for confusing, unnecessary, and counterintuitive data.

How to Remove EXIF Data From Your Photos

EXIF can be an invasion of privacy in a couple of different ways. Luckily, there are lots of methods to remove EXIF from your photos, selectively and entirely.

Removing EXIF from Photos on Mac

Mac OS doesn’t have a built-in EXIF remover, but it will allow you to remove the location from an image easily. Simply open your image in Preview, open the Inspector (Command + I), click the “GPS” tab and select “Remove Location Info” at the bottom.

If you want to remove all of the metadata, ImageOptim is a free, easy-to-use program that will strip an image of all of its metadata in seconds. Be aware that this is irreversible, so if you want a copy of the metadata somewhere, it’s best to make two copies of the image and only strip one.

At this time, it doesn’t appear that ImageOptim supports Apple .HEIC compression, so you will have to convert photos to .JPG or .PNG. There are various settings that you can change, but make sure to do it before dragging and dropping images into the software.

Removing EXIF from Photos on Windows

Windows has a built-in EXIF remover and will let you selectively remove EXIF from an image. Right click the image, select “Properties” and then click the “Details” tab. You’ll see a button that says “Remove Properties and Personal Information,” which will allow you to selectively remove parameters from the image’s EXIF. You can also simply create a copy with all possible properties removed.

Removing EXIF from Photos on iOS and Android

First of all, it’s possible to disable geotagging (saving GPS data to your photo) in Settings or your Camera app. If you’d still like geotagging but want to remove the location from a specific photo on iOS, simply go into the Photos app, swipe up, and select “Adjust” under the map.

From here, you are able to change or remove the location.

Both iOS and Android have apps to completely remove metadata from photos. Check out Exif Metadata for iOS and Photo Metadata Remover for Android as two free options.

Conclusion

EXIF is the file that stores certain parameters of an image called metadata. You can use it to your advantage as a photographer by learning from your own and other metadata, and it can also be harmful in terms of security and privacy.

Understanding EXIF is important in becoming a better photographer so that you can take advantage of its pros and be aware of its cons.



from Hacker News https://ift.tt/4cDhBaV

Taking a look at the Rogers outage CRTC letter

Last week, the CRTC made public a redacted version of the response Rogers filed with answers to the regulators questions. I started my career in wireless telecommunications for a competitor to Rogers here in Canada, and wanted to dig through the outage. I've been out of the industry for a few years now, but my outdated perspective may be of some use here.

Unfortunately, the redacted version removes almost all the useful information for other carriers or other industries to learn from this outage. But I still dug through the report to try and find what useful information I could.

Resources

Regulator Filings

If you're instead interested in a media level analysis, here are some of the media reports i've found.

CRTC Letter

The letter itself basically outlines that the Canadian Radio-television and Telecommunications Commission (CRTC) wants to inquire into the outage that began on July 8th, 2022. And adds the basis for this outreach, including disruptions to business, emergency services, and more.

You can read the full letter above, but effectively it's an outline of questions that Rogers has been asked to respond to. I'll go through what I see as the important questions and perspective as we go through the response.

Rogers Response

At a high level, Rogers starts by confirming that they had identified the cause of the outage.

We have identified the cause of the outage to a network system failure following an update in our core IP network during the early morning of Friday July 8th. This caused our IP routing network to malfunction. To mitigate this, we re-established management connectivity with the routing network, disconnected the routers that were the source of the outage, resolved the errors caused by the update and redirected traffic, which allowed our network and services to progressively come back online later that day. While the network issue that caused the full-service outage had largely been resolved by the end of Friday, some minor instability issues persisted over the weekend.

What I find interesting about this is the reference to minor instability issues that persisted over the weekend.

When I worked on a similar outage, where the underlying core network went down for 20 or 30 minutes, my team spent the next 10 or so hours finding and fixing the glitches in the cellular network. This was mainly in broken states within different equipment. Nothing exposes those bugs as good as taking out the underlying network at all layers at the same time.

For example think of an HTTP proxy. A client comes to the proxy, makes a request, the proxy holds some request state, and sends the request to a server. You probably know exactly what happens and it's well tested if the upstream server goes down. Just return an error to the client. If the client loses connection, well understood as well, complete the upstream request, and discard it. But what happens if both the upstream and client networks fail at the same time. You start exercising a code path that combines failures that usually aren't as well tested. Combine this with some missing internal state, and you start having failures.

Especially in cellular networks, there are all sorts of state that's distributed around the network and needs to be kept in sync for different call procedures. This can result in things like being able to send but not receive an SMS message.

Our engineers and technical experts have been and are continuing to work alongside our global equipment vendors to fully explore the root cause and its effects. 

This is another interesting point that might be missed on many that don't know the industry. Telecommunication companies primarily operate as integrators of vendors equipment. While this isn't always the case, most equipment to build and create the network will be purchased from vendors. So if you want a cellular radio for LTE, you may buy the particular piece of equipment from Nokia, Ericcson, Huawei, or others.

In this model of buying equipment from vendors, it often creates an incentive to blame the vendor for outages. But even harder at times, can be to convince another company to work on a feature or design change to make configuration errors less likely. Or like any complex situation, there is probably plenty of blame to share between both the operator and the vendor.

Additionally, Rogers will work with governmental agencies and our industry peers to further strengthen the resiliency of our network and improve communication and co-operation during events like this. Most importantly, we will explore additional measures to maintain or transfer to other networks 9-1-1 and other essential services during events like these.

I'll dig into this deeper later on. But the failure of 911 services, especially for cell phones seems like an incredibly bad design oversight. Cell phones when not attached to a network, can already do an emergency call on any network the phone can find. So by continuing to advertise the Rogers network throughout the outage, caused phone to stick to the broken network. Or probably caused an unattached phone to scan for an available network, and not find a working 911 network.

Questions and Answers

About the outage

Provide a complete and detailed report on the service outage that began on 8 July 2022

Unfortunately it looks like Rogers preferred to keep the full details confidential, and the full timeline was attached as confidential. But included are some high level details.

The network outage experienced by Rogers on July 8th was the result of a network update that was implemented in the early morning. The business requirements and design for this network change started many months ago. Rogers went through a comprehensive planning process including scoping, budget approval, project approval, kickoff, design document, method of procedure, risk assessment, and testing, finally culminating in the engineering and implementation phases. Updates to Rogers’ core network are made very carefully.

For those outside of telecom, method of procedure may be unfamiliar. While different companies may operate differently, the method of procedure is basically a document that outlines how a change in the network should be executed. It can be as detailed as every command someone in operations should run to make the change, step by step. It may also contain pre-checks that should be executed.

The philosophy is a sort of separation of inputs, where one engineer will write the procedure for the change to be executed, and then someone tasked with operations will be responsible for execution. I don't know if this is how Rogers is using MOPs however.

Maintenance and update windows always take place in the very early morning hours when network traffic is at its quietest.

I can't comment as to Rogers, however I know at other big telecoms in Canada that this isn't true for every change. There are plenty of changes that are deemed non-risky, and will be executed when convenient. I've also seen different managers and agenda come into play that try to push against change windows to be more efficient, and often a back and forth on the balance of stability vs getting things done.

The configuration change deleted a routing filter and allowed for all possible routes to the Internet to pass through the routers. As a result, the routers immediately began propagating abnormally high volumes of routes throughout the core network. Certain network routing equipment became flooded, exceeded their capacity levels and were then unable to route traffic, causing the common core network to stop processing traffic. 

This twitter thread likely covers this side of the impact better than I can: https://twitter.com/atoonk/status/1550896347691134977

The Rogers outage on July 8, 2022, was unprecedented. As discussed in the previous response, it resulted during a routing configuration change to three Distribution Routers in our common core network. Unfortunately, the configuration change deleted a routing filter and allowed for all possible routes to the Internet to be distributed; the routers then propagated abnormally high volumes of routes throughout the core network. Certain network routing equipment became flooded, exceeded their memory and processing capacity and were then unable to route and process traffic, causing the common core network to shut down. As a result, the Rogers network lost connectivity internally and to the Internet for all incoming and outgoing traffic for both the wireless and wireline networks for our consumer and business customers.

I'm going to nitpick calling this an unprecedented outage. Based on what I can get out of the report and not being able to see the full internal analysis, this seems fairly predictable. The outage explanation is basically some automation removed a piece of configuration that was essential for the routers to function. Without the configuration, routers within the network went into CPU overload processing routing updates.

Someone knew about this and placed that configuration on the router.

This is a known failure mode for most routing equipment. Most of these big routers seem like big powerful machines, but they tend to be more of a fast path for pushing many millions of packets. And a separate compute for sending and receiving signalling about where those packets should go. The control plane side that does the signalling there are often many ways to overload, which will cause peers to think the router has died.

How did Rogers prioritize reinstating services and what repairs were required?

The prioritization of service restoration was always dependent on which service was most relied upon by Canadians for emergency services. As wireless devices have become the dominant form of communicating for a vast majority of Canadians, the wireless network was the first focus of our recovery efforts. Subsequently, we focused on landline service, which remains another important method to access emergency care. We then the worked to restore data services, particularly for critical care services and infrastructure.

I suspect reality is a bit more nuanced that multiple teams were probably looking at the equipment they were responsible for in parallel. If there was a glitch for some subscribers in the LTE network for example, people who focus on the LTE network were likely addressing that. But from a total outage perspective, I'm sure teams were focussing more or less in this order.

Having working big outages like this, once the underlying IP network is restored and the upper layer network comes back up, there will be a plethora of impacts, alarms, metrics that need to be sorted through.

When I worked a similar outage for a competitor, the team would find something like call failure rates are 5x normal, but also trending back towards normal. And then it's a tough thing to figure out, will it return to normal on its own? Maybe customers are rebooting their phones, so it looks like it's returning to normal but only because customers are taking their own actions. What options do we have to clean up the call states that are leading to those failures? Maybe the only tool available doesn't know who is and isn't working, so the only option is to reset every device. Is resetting every device worth it to move from a 2% failure rate to a normal 0.4% failure rate?

What measures or steps were put in place in the aftermath of the earlier-mentioned April 2021 outage, and why they failed in preventing this new outage?

Everything substantive in the response to this question was redacted. It's unfortunate, because as an industry, including non-telecom like SRE can learn substantially from how Telecoms operate networks as reliable as they are.

Basically Rogers did lots of vague things to prevent similar failures.

How did the outage impact Rogers’ own staff and their ability to determine the cause of the outage and restore services?

At the early stage of the outage, many Rogers’ network employees were impacted and could not connect to our IT and network systems.  This impeded initial triage and restoration efforts as teams needed to travel to centralized locations where management network access was established. To complicate matters further, the loss of access to our VPN system to our core network nodes affected our timely ability to begin identifying the trouble and, hence, delayed the restoral efforts.  

This is the inherent difficulty in managing network systems. While in my experience the network architectures tend to be layered, when you're also the ISP I can imagine the difficulty in this separation. And problems are rare enough that it gets difficult to predict exactly what outages will knock out employee access.

Alot of the carriers also lease infrastructure from eachother, so one big telco can cause impacts in another.

I almost wonder if there is room for something like Starlink here, as the satellite infrastructure would almost be guaranteed to provide diverse network access to a location, without running additional wires in the ground.

Having experienced a similar failure, when I noticed my home internet and cell phone were all offline, I started driving to the office to get a stable connection to production. That was a long day.

Extent to which Rogers sought or received assistance from other TSPs in addressing the outage or situation arising from the service interruption?

In order to allow our customers to use Bell or TELUS’ networks, we would have needed access to our own Home Location Register (“HLR”), Home Subscriber Server (“HSS”) and Centralized User Database (“CUDB”). This was not possible during the incident. 

Basically what Rogers is describing here are the databases used to track and authenticate mobile devices were also unavailable. In cellular mobility, the network can be thought of as two networks for two different purposes. There is a visited network, more or less the radio towers you connect to. And the home network, which is what actually gives you internet, cellular, SMS, etc services. When you're on your own carrier, you're using your own carriers visited and home networks.

But when you go somewhere else, say the United States, you change your visited network and keep your home network. What Rogers is basically saying, is there home network was broken, so the other carriers weren't able to help, because they wouldn't be able to authenticate devices, or connect them back to this home network.

There is some tech that allows for internet connections to only use the visited network, but when I was in the industry this was not commonly deployed.

Furthermore, given the national nature of this event, no competitor’s network would have been able to handle the extra and sudden volume of wireless customers (over 10.2M) and the related voice/data traffic surge. If not done carefully, such an attempt could have impeded the operations of the other carriers’ networks. 

This is a key point. I don't know that any of the other major carriers would be able to predict how the sudden influx would affect their networks. While telecom is trying to move in a model alot more like SaaS providers that can autoscale equipment, I don't know how successful this would be.

Let's pick something simple. The number of IP addresses available to be assigned to a phone. The other operators likely only provision enough to cover their own needs. Taking over another carrier in Canada isn't an expected failover pattern, so there isn't provisioning to handle this need. And IP addresses are likely only 1 of 10,000 different capacity constraints that go into a mobile network.

My 2 cents is the most likely path to even considering allowing something like this would be more akin to multi-sim support. Where you would have credentials on your Sim card for both Rogers and a competitor network and the device could switch, if warranted. And then the possibility of this failover goes into the capacity planning for the networks and is only for customers that need it.

Impact on Emergency Services

The impact to emergency services is really why I wanted to dig into this report, and try to understand specifically why Rogers cell phones were unable to make emergency calls. Having done some small work on 911 services for cell phones, normally any device is able to attach to any network, unauthenticated, to make emergency calls. Which means while the Rogers core network was down, something was causing the Rogers radio network to continue to advertise that it was able to take emergency calls. Otherwise the cell phones should have scanned for any available wireless network, and done an emergency attach to get emergency services.

You don't need a SIM card in your phone, a valid account, etc to make a 911 call.

Provide a complete and detailed report on the impact on emergency services of the outage that began on 8 July 2022, including but not limited to:

With respect to wireless public alerting service (WPAS”), the Rogers Broadcast Message Center (“BMC”) platform was operable to receive alerts from Pelmorex, the WPAS administrator.  However, broadcast-immediate (“BI”) public alerts could not be delivered to any wireless devices across Rogers’ coverage areas due to the outage. Based on a review of the alerts received into the WPAS BMC platform, the only impact occurred in the Province of Saskatchewan. There were four alerts, and associated updates, received but not delivered to wireless devices in Rogers’ coverage area. There were no other alerts issued, as seen on our WPAS BMC platform.

I interpret this and other areas of the report to basically indicate that the IP core outage caused the cell sites to be unreachable. So to send an alert out to all devices in an area, you need to reach the cell site to send the message to phones.

With respect to broadcasting (cable TV/Radio) alerts, our alert hardware is connected to our IP network. Since we had no connection to the Internet on July 8th, we were unable to send out any alerts on that day in the regions that we were serving.

And also alerts on services that weren't cell sites all use the IP network.

Whether the outage specifically impacted the 9-1-1 networks or only the originating networks, and if the former, how was this possible in light of resiliency and redundancy obligations imposed by the Commission

The outage solely impacted Rogers’ originating network. The 9-1-1 networks that receive calls from originating networks are not operated by Rogers. Rather, they are operated by the three large Canadian Incumbent Local Exchange Carriers (“ILECs”). They were unaffected by the outage.

This is basically saying that Rogers is not operating the 911 networks themselves, as those are operated by Telus, Bell, and Sasktell across Canada. So the impact was isolated to Rogers network being able to deliver emergency calls to the 911 networks.

Number of public alerts sent that did not reach Rogers’ customers, broken down by province;

Only four (4) alerts were received on Rogers WPAS BMC platform on July 8th. All alerts were in the Province of Saskatchewan. No other alert was issued in Canada on that day:

1. WPAS ID 957:  7:40AM CST: Saskatchewan RCMP – Civil Emergency (Dangerous Person)
2. WPAS ID 960:  4:05PM CST: Environment Canada – Tornado (Warning)
3. WPAS ID 964:  4:19PM CST: Environment Canada – Tornado (Warning)
4. WPAS ID 982:  5:31PM CST: Environment Canada – Tornado (Warning)

I'm just including this as it adds perspective on the number of alerts Rogers wasn't able to deliver across the country that day.

How were 9-1-1 calls processed during the outage and whether they were able to be processed by other wireless networks within the same coverage area

As seen in Rogers(CRTC)11July2022-2.i above, Rogers was able to route thousands of 9-1-1 calls on July 8th.  Rogers’ wireless network worked intermittently during that day as we were trying to restore our IP core network, varying region by region.

...

The connection state of the UE to Rogers wireless network, and the stability of our network, determined the ability of Rogers wireless customers to have their 9-1-1 calls processed by other wireless networks within the same coverage area.

Bell and TELUS confirmed to us that some of our customers were able to connect to their wireless networks in order to place 9-1-1 calls.

The UE is the User Equipment, basically the cell phone.

I think what Rogers is trying to say, is if the cell phone was in a state where it didn't see the Rogers network, like "No Service" on the display, an emergency call would do a network scan and attach to another network with service.

Whether other measures could have been taken to re-establish 9-1-1 services sooner

No other measures would have helped restore 9-1-1 service on July 8th. One possible option that was explored by Rogers was to shut down our RAN. Normally, if a customer’s device cannot connect to their own carrier’s RAN, they will automatically connect to the strongest signal available, even from another carrier, for the purpose of making a 9-1-1 call. However, since Rogers’ RAN remained in service on July 8th, many Rogers customers phones did not attempt to connect to another network.

If this is the case, what the report doesn't talk about at all is that non-Rogers customers could be impacted by this outage as well. If a competitor had very weak coverage, it might scan and find the Rogers network and try to use it for the emergency call, instead of finding a network to use. Or a device that is roaming and in airplane mode and then needs to make an emergency call, will try to attach to the first network it finds.

So if this was the case, the Rogers network behaving like this could have prevented emergency calls from reaching a working network. This certainly would be the minority of devices, but I find this to be infuriating that the network sat there advertising 911 services, for hours, that it could not be delivered.

What alternatives are available to Rogers’ customers to access 9-1-1 services during such outages

The GSM standard for the routing of 9-1-1 calls implies that a wireless customer always has the option to remove the SIM card from their device and then to place the 9-1-1 call. The handset will register to another wireless network (the one with the strongest signal, even if there are not roaming arrangements).

I think the problem is most people wouldn't know even what their sim card is, let alone that it can be removed to make a 911 call. But as with above, the problem here appears to be Rogers continued to advertise a 911 network, so the device might still connect.

I'm also not sure that the strongest signal is correct. With so many frequencies now in use for cellular networks, it can take quite some time to do a full network scan. So I'm not sure devices in this state will scan all possible networks and then choose the strongest signal. I think but am not sure that the devices would connect to the first network found.

Further, some newer smart devices have the capability to reconnect automatically to other wireless network for 9-1-1 calls when the home network is down. 

If this is the case that's great. It would probably be a great idea to put this into the standard.

Summary

I really wanted to dig into this report, as I've worked these sorts of outages. And they are really difficult, as there are infinite ways to make the problems worse, lots of contradictory information, theories of the root cause, and missing information. It is really difficult to operate these networks.

But what I found most frustrating, was the 911 services. The core network is down, but the radio network is continuing to indicate 911 services. While I only suspect this is the case, there isn't a good reason that other networks couldn't be used, other than a desire to do so. I really hope this doesn't get lost on all the major carriers after this outage.



from Hacker News https://ift.tt/xBV0G1a