我这边有一个项目,现场用了70多台这样的设备。接到的反馈是,总有5-10台设备运行3-14天就完全没有反应,重启就正常。哪台设备,能正常运行多少天才死机,是随机的。后来最终定位是Lwip处理分片包数据确实是有BUG。经过3个月的时间,我修复了这个问题。其中包括长期运行了仅2个月来验证修复效果,没有一台出问题了。这里提供我的解决方案,大家可参考:
static int
ip_reass_remove_oldest_datagram(struct ip_hdr *fraghdr, int pbufs_needed)
{
/* @ToDo Can't we simply remove the last datagram in the
* linked list behind reassdatagrams?
*/
struct ip_reassdata *r, *oldest, *prev, *oldest_prev;
int pbufs_freed = 0, pbufs_freed_current;
int other_datagrams;
/* Free datagrams until being allowed to enqueue 'pbufs_needed' pbufs,
* but don't free the datagram that 'fraghdr' belongs to! */
do {
oldest = NULL;
prev = NULL;
oldest_prev = NULL;
other_datagrams = 0;
r = reassdatagrams;
while (r != NULL) {
if (!IP_ADDRESSES_AND_ID_MATCH(&r->iphdr, fraghdr)) {
/* Not the same datagram as fraghdr */
other_datagrams++;
if (oldest == NULL) {
oldest = r;
oldest_prev = prev;
} else if (r->timer <= oldest->timer) {
/* older than the previous oldest */
oldest = r;
oldest_prev = prev;
}
}
if (r->next != NULL) {
prev = r;
}
r = r->next;
}