This resolves https://github.com/ledgerwatch/erigon/issues/10135
All enums are constrained by their owning type which forces package
includsion and hence type registration.
Added tests for each type to check the construction cycle.
When adding bor waypont types I have removed snaptype.AllTypes because
it causes package cross-dependencies.
This fixes the places where all types have been used post the merge
changes.
Implementation of db and snapshot storage for additional synced hiemdall
waypoint types
* Checkpoint
* Milestones
This is targeted at the Astrid downloader which uses waypoints to verify
headers during syncing and fork choice selection.
Post milestones for heimdall these types are currently downloaded by
erigon but not persisted locally. This change adds persistence for these
types.
In addition to the pure persistence changes this PR also contains a
refactor step which is part of the process of extracting polygon related
types from erigon core into a seperate package which may eventually be
extracted to a separate module and possibly repo.
The aim is rather than the core `turbo\snapshotsync\freezeblocks` having
to know about types it manages and how to exaract and index their
contents this can concern it self with a set of macro shard management
actions.
This process is partially completed by this PR, a final step will be to
remove BorSnapshots and to simplify the places in the code which has to
remeber to deal with them. This requires further testing so has been
left out of this PR to avoid delays in delivering the base types.
# Status
* Waypont types and storage are complete and integrated in to the
BorHeimdall stage, The code has been tested to check that types are
inserted into mdbx, extracted and merged correctly
* I have verified that when produced from block 0 the new snapshot
correctly follow the merging strategy of existing snapshots
* The functionality is enables by a **--bor.waypoints=true** this is
false by default.
# Testing
This has been tested as follows:
* Run a Mumbai instance to the tip and check current processing for
milestones and checkpoints
# Post merge steps
* Produce and release snapshots for mumbai and bor mainnet
* Check existing node upgrades
* Remove --bor.waypoints flags
For period where there are not many sync events (mostly testnets) sync
event fecthing can be slow becuase sync events are fetched at the end of
every sprint.
Fetching the next and looking at its block number optimizes this because
fetches can be skipped until the next known block with sync events.
TL;DR: on a reorg, the common ancestor block is not being published to
subscribers of newHeads
#### Expected behavior
if the reorg's common ancestor is 2, I expect 2 to be republished
1, 2, **2**, **3**, **4**
#### Actual behavior
2 is not republished, and 3's parentHash points to a 2 header that was
never received
1, 2, **3**, **4**
This PR is the same thing as
https://github.com/ledgerwatch/erigon/pull/9738 except with a test.
Note... the test passes, but **this does not actually work in
production** (for Ethereum mainnet with prysm as external CL).
Why? Because in production, `h.sync.PrevUnwindPoint()` is always nil:
a5270bccf5/turbo/stages/stageloop.go (L291)
which means the initial "if block" is never entered, and thus we have
**no control** of increment/decrement `notifyFrom` during reorgs
a5270bccf5/eth/stagedsync/stage_finish.go (L137-L146)
I don't know why `h.sync.PrevUnwindPoint()` is seemingly always nil, or
how the test can pass if it fails in prod. I'm hoping to pass the baton
to someone who might. Thank you @indanielo for original fix.
If we can figure this bug out, it closes#8848 and closes#9568 and
closes#10056
---------
Co-authored-by: Daniel Gimenez <25278291+indanielo@users.noreply.github.com>
In the current go 1.21 version used in the project, slices are no longer
an experimental feature and have entered the standard library
Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>
In PR:
- new .lock format introduced by
https://github.com/ledgerwatch/erigon/pull/9766 is not backward
compatible. In the past “empty .lock” did mean “all prohibited” and it
was changed to “all allowed”.
- commit
Not in PR: I have idea to make .lock also forward compatible - by making
it whitelist instead of blacklist: after adding new snap type it will
not be downloaded by accident. Will do it in next PR.
But I need one more confirmation - why do we need exceptions from .lock?
Why we breaking "download once" invariant for some type of files? Can we
avoid it?
This PR introduces support for customising Silkworm RpcDaemon settings
in Erigon++.
Common RPC settings between Erigon and Silkworm are simply translated
from the existing Erigon command-line options. They include:
- `--http.addr`
- `--http.port`
- `--http.compression`
- `--http.corsdomain`
- `--http.api`
- `--ws`
- `--ws.compression`
Moreover, the following Silkworm-specific command-line options are
added:
- `--silkworm.verbosity`
- `--silkworm.contexts`
- `--silkworm.rpc.log`
- `--silkworm.rpc.log.maxsize`
- `--silkworm.rpc.log.maxfiles`
- `--silkworm.rpc.log.response`
- `--silkworm.rpc.workers`
- `--silkworm.rpc.compatibility`
Default values cover the common usages of Erigon++ experimental
features, yet such options can be useful for testing some corner cases
or collecting information.
Finally, this PR adds a new `LogDirPath` function to `logging` module
just to determine the log dir path used by Erigon and put there also
Silkworm RPC interface logs, when enabled.
**Summary**
Fixes prune point for log (+index)
- Unnecessary to use ETL again for deleting `kv.Log` entries, can just
introduce `RwCursor` in the initial loop
- Put the last `pruneTo` block number in the `PruneState` - this will
begin pruning from that point. Earlier the `pruneFrom` point being
passed in was buggy as it used some other assumption for this value
We reverted support of this flag in `updateForkChoice` because
implementation was too complex and fragile:
https://github.com/ledgerwatch/erigon/pull/9900
But it's good-enough if StageSenders will preserve this flag - then next
stages (exec) will also follow (because they look at prev stage
progress).
It's good-enough - because users just want to save some partial progress
after restoring node from backup (long downtime). And enforce "all
stages progress together" invariant
This fixes this issue:
https://github.com/ledgerwatch/erigon/issues/9499
which is caused by restarting erigon during the bor-heimdall stage.
Previously after the initial call to bor-heimdall (header 0), forward
downloading was disabled, but backward
downloading recursively collects headers - holding results in memory
until it can roll them forward. This should
only be called for a limited number of headers, otherwise it leads to a
large amount of memory >45GB for bor
main net if the process is stopped at block 1.
Existing prune had some confusing logic (thanks to me).
In PR, change the logic to:
if receipt, or any individual log in the receipt contains an address in
`noPruneContracts` the entire log must be stored and indexed.
Also fixes a regression that logs were not getting indexed - this was
due to
```
if l.BlockNumber < pruneBlock && cfg.noPruneContracts != nil && !cfg.noPruneContracts[l.Address] {
```
Since the individual log object didn't have the block number - something
that shouldn't matter.
In PR, change the logic to:
Use `blockNum` from outer loop
### Change ###
Adds a `disableBlockDownload` boolean flag to current implementation of
sentry multi client to disable built in header and body download
funcitonality.
### Long Term ###
Long term we are planning to refactor sentry multi client and de-couple
it from custom header and body download logic.
### Context ###
Astrid uses its own body download logic which is de-coupled from sentry
multi client.
When both are used at the same time (using `--polygon.sync=true`) there
are 2 problematic scenarios:
- restarting Astrid takes a very long time due to the init logic of
sentry multi client. It calls `HeaderDownload.RecoverFromDb` which is
coupled to the Headers stage in the stage loop. So if Astrid has fetched
1 million headers but hasn't committed execution yet then this will
result in very slow start up since all 1 million blocks have to be read
from the DB. Example logs:
```
[INFO] [04-16|12:55:42.254] [downloader] recover headers from db left=65536
...
[INFO] [04-16|13:03:42.254] [downloader] recover headers from db left=65536
```
- debug log messages warning about sentry consuming being slow since
Astrid does not use `HeaderDownload` and `BodyDownload` so there is
nothing consuming the headers and bodies from these data structures.
This has no logical impact, however clogs resources. Example logs:
```
[DBUG] [04-16|14:03:15.311] [sentry] consuming is slow, drop 50% of old messages msgID=BLOCK_HEADERS_66
[DBUG] [04-16|14:03:15.311] [sentry] consuming is slow, drop 50% of old messages msgID=BLOCK_HEADERS_66
```
This PR adds headers which prevent the cloudflare firewall from banning
the downloaders http requests.
At the moment these are not obfuscated in the codebase.
This should be reviewed before this change is committed to the core
repo.
# Sync Committee Contribution pooling
First of all, a distinction:
* Sync Committee Messages are single unaggregated messages from a
specific validator
* Sync Committee Contributions are aggregated messages from many
validators.
We get these 2 messages from 2 different gossip sources, and then after
validating the Gossip rules, we send everything to the pool which
aggregate the `sync committee` or `contribution` into another aggregate.
## Sync Committee subscription:
/eth/v1/validator/sync_committee_subscriptions
The subscription just starts to subscribe to the `Sync Committee`'s
Gossip channel. it is actually really simple, here is the pseudo-code
for how the subscription happen for each `ValidatorIndex` requested:
```vb
Function PostEthV1ValidatorSyncCommitteeSubscriptions
Request: ValidatorIndicies []uint64
Get the head state of the blockchain
For each subRequest in ValidatorIndicies do
Compute the subnets for the validator index from subRequest using the headState
For each subnet in syncnets do
Construct a topic name for the sync committee subnet
Subscribe to the topic with the sync committee subnet the topic with the calculated expiry time in sentinel
End Function
```
### Extras
* /eth/v1/validator/contribution_and_proofs - Submit to the node a
contribution to be republished on the Gossip.
* /eth/v1/validator/sync_committee_contribution - Retrieve a
contribution (from the pool) that we have aggregated from the gossip.
* /eth/v1/beacon/pool/sync_committees - Submit to the node a sync sync
committee messages to be republished on the Gossip.
---------
Co-authored-by: Kewei <kewei.train@gmail.com>
This PR splits the existing `silkworm_execute_blocks` endpoint in CAPI
and creates two separate endpoints:
- `silkworm_execute_blocks_ephemeral`: takes a db txn as a parameter and
uses it to flush the data, but does not commit the transaction
- `silkworm_execute_blocks_perpetual`: creates its own db txn, commits
the data and leaves the database in a coherent state
Associated PRs:
- https://github.com/erigontech/silkworm-go/pull/4
- https://github.com/erigontech/silkworm/pull/1917
---------
Co-authored-by: canepat <16927169+canepat@users.noreply.github.com>
Co-authored-by: battlmonstr <battlmonstr@users.noreply.github.com>
Replaced the `--override.cancun` flag with `--override.prague`.
Also, removed `Txpool.OverrideCancunTime` as its counterpart isn't (yet)
needed for Prague.
this salt used for creating RecSplit indices. all nodes will have
different salt, but 1 node will use same salt for all files. it allows
using value of `murmur3(key)` to read from all files (importnat for
non-existing keys - which require all indices check).
- add `snapshots/salt-blocks.txt
- this PR doesn't require re-index
- it's step1. in future releases we will add data_migration script which
will "check if all indices use same salt or re-index" (but at this time
most of users will not affected).
- new indices will use `salt-blocks.txt`
The responsibility to maintain the status data is moved from the
stageloop Hook and MultiClient to the new StatusDataProvider. It reads
the latest data from a RoDB when asked. That happens at the end of each
stage loop iteration, and sometimes when any sentry stream loop
reconnects a sentry client.
sync.Service and MultiClient require an instance of the
StatusDataProvider now. The MessageListener is updated to depend on an
external statusDataFactory.
This PR contains changes which related to gather information about
"Bodies" stage.
Change list is next:
- added entities for block download, write, process and processing
- added listeners and collect info for above
- added API to query this data