rushlink/README.md

159 lines
5.0 KiB
Markdown

# RushLink
A URL shortener and (maybe) a pastebin server for our #ru community.
## Building
- `go get -u github.com/go-bindata/go-bindata/...`
- `go generate ./...`
- `go build ./cmd/rushlink`
## Deploying
We recommend running `rushlink` behind a reverse proxy suitable for processing
HTTP requests, such as `nginx`, or `haproxy`.
## Sample `nginx` config
```
server {
location / {
root /var/www/rushlink;
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host rushlink.local;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
}
}
```
`rushlink` automatically detects whether `http` or `https` is used when
`X-Forwarded-Proto` is correctly set. Otherwise, pass `-root_url
https://rushlink.local` to the binary (e.g. in the `systemd` unit file).
## Sample `systemd` unit file
As of 1fe9553cc9, `rushlink` expects its database and file store configuration
in environment variables.
```
[Install]
WantedBy=nginx.service
[Service]
Type=simple
User=rushlink
Group=nogroup
Environment=RUSHLINK_DATABASE_DRIVER=sqlite RUSHLINK_DATABASE_PATH=/var/lib/rushlink/rushlink.sqlite3 RUSHLINK_FILE_STORE_PATH=/var/lib/rushlink/filestore/
ExecStart=/var/lib/rushlink/rushlink -root_url https://rushlink.local
```
---
# Background
## Libraries
Use standard-Go-libraries if the job can be done with those. As of now, these
are the exceptions:
- `github.com/gorilla/mux` provides useful stuff for routing requests.
- `github.com/gorilla/sessions` for session management.
- `go.etcd.io/bbolt` is our database driver.
- `github.com/pkg/errors` provides a [`Wrap`] function.
- `github.com/prometheus/client_golang/prometheus/...` has easy Prometheus
functionality.
[`Wrap`]: https://godoc.org/github.com/pkg/errors#Wrap
## Database
Before 1fe9553cc9 we used [`go.etcd.io/bbolt`]. This file should be the *only* file
apart from our monolithic binary. All settings and keys should go in here.
Any read-only data resides in the binary file (possibly compressed).
Now, we use Gorm and support SQLite and PostgreSQL as backends.
We provide a migration binary in `cmd/rushlink-migrate-db/`. In environment
variables, the destination database is configured, and the binary itself
expects a few flags. Refer to the source for the exact flags.
[`go.etcd.io/bbolt`]: `go.etcd.io/bbolt`
## Namespacing
All shortened URLs exist as a key on the root of the webserver, i.e. `/xd42`.
That means that we have to separate every other page with some kind of
namespace. Ideas:
- `/z/` reserved for flat pages.
- `/z/static/` reserved for "static files".
## Paste types
Previous, I was planning to put pastes (not redirects) in a separate bucket
and below a url namespace. Then when one was created, we would immediately
generate a redirect link.
Turns out it's easier though to just save a unit (we call it a "paste") which
can be of different types. For example:
- Redirect
- Paste
- File
- Image
- Hypothetical other stuff:
- Dead-drop
- [...]?
## Shorten keys and collisions
First of all: A sextet is a value of 6 bits.
For generating keys, we will initially generate a random value of 4 sextets,
where the first bit is set to `0`. If this collides with an existing key, we
will generate a new one made out of 5 sextets, and set the prefix bits to
`0b10`. We will keep doing this until we don't have any collisions anymore.
To get proper-looking keys, we format the key to characters using the
base64url alphabet described in ([RFC4648, par. 5]). The encoded value will
be saved in the database.
[RFC4648, par. 5]: https://tools.ietf.org/html/rfc4648#section-5
## UI design
As is tradition in a lot of URL-shortener/pastebin-like services, we will put
everything in a single `<pre>` tag, and if possible, just serve `text/plain`.
A good example is <https://0x0.st>.
The reason we would use `text/html` instead of `text/plain` is basically
form submissions and JavaScript. Our main API should be cURL, but it would be
useful if users could also use the website and/or drag-and-drop files and URLs.
On the other hand, using `text/plain` saves us *so much effort*, because we
don't have to do any HTML/CSS/JavaScript. We have native terminal support, etc.
The best thing would probably to do both, and correctly listen to the `Accept`
header that the client sends. We can still wrap the plain-text page in a single
`<pre>` to keep it easy for ourselves.
## Retention
- If we can, we don't want to have user accounts. We store the sessions
forever, and store a user's data in there, without having to collect personal
data in any way.
- URL-shortening links will be retained for always, unless the submitter
revokes it, in which case it will be replaced by a `410 Gone` page[*].
- The probles of pastes are not solved. This is an unsolved problem[*].
[*] In any case, we going to comply with all European laws and reasonable
requests for deletion.
## Privacy
We will try as hard as possible to not store any data about our users, and will
only provide any data when we have the legal obligation to do so.