

No problem. I think this is a great “final boss” question for learning sed, because it turns out it is deceptively hard!! You have to understand not only a lot about regex, but about sed to get it right. I learned a lot about sed just by tackling this problem!
I really do not want to mess around with your regex
It is very delicate for sure, but one part you can for sure change is at the # Add hyphens
part. In the regex you can see (%20|\.)
. These are a list of “characters” which get converted to hyphens. For example, you could modify it to (%20|\.|\+)
and it will convert +
s to -
s as well!
Still it is not perfect:
- If the link spans multiple lines, the regex won’t match
- If the link contains escaped characters like
\\\\\[LINK](#LINK)
or[
]\\\\]( - If the link is inside a code block ``` it will get changed (which may or may not be intended)
But for a sed-only solution this is about as good as it will get I’m afraid.
Overall I’m very happy with it. Someday I would like to make a video that goes into depth about sed, since it is tricky to learn just from the docs.
this might not be what you meant, but the word “tar” made me think of tar.gz. Don’t most sites compress the HTTP response body with gzip? What’s to stop you from sending a zip bomb over the network?